mirror of
https://github.com/gryf/ebook-converter.git
synced 2026-03-06 09:15:55 +01:00
Initial import
This commit is contained in:
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
__pycache__
|
||||||
674
LICENSE
Normal file
674
LICENSE
Normal file
@@ -0,0 +1,674 @@
|
|||||||
|
GNU GENERAL PUBLIC LICENSE
|
||||||
|
Version 3, 29 June 2007
|
||||||
|
|
||||||
|
Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
|
||||||
|
Everyone is permitted to copy and distribute verbatim copies
|
||||||
|
of this license document, but changing it is not allowed.
|
||||||
|
|
||||||
|
Preamble
|
||||||
|
|
||||||
|
The GNU General Public License is a free, copyleft license for
|
||||||
|
software and other kinds of works.
|
||||||
|
|
||||||
|
The licenses for most software and other practical works are designed
|
||||||
|
to take away your freedom to share and change the works. By contrast,
|
||||||
|
the GNU General Public License is intended to guarantee your freedom to
|
||||||
|
share and change all versions of a program--to make sure it remains free
|
||||||
|
software for all its users. We, the Free Software Foundation, use the
|
||||||
|
GNU General Public License for most of our software; it applies also to
|
||||||
|
any other work released this way by its authors. You can apply it to
|
||||||
|
your programs, too.
|
||||||
|
|
||||||
|
When we speak of free software, we are referring to freedom, not
|
||||||
|
price. Our General Public Licenses are designed to make sure that you
|
||||||
|
have the freedom to distribute copies of free software (and charge for
|
||||||
|
them if you wish), that you receive source code or can get it if you
|
||||||
|
want it, that you can change the software or use pieces of it in new
|
||||||
|
free programs, and that you know you can do these things.
|
||||||
|
|
||||||
|
To protect your rights, we need to prevent others from denying you
|
||||||
|
these rights or asking you to surrender the rights. Therefore, you have
|
||||||
|
certain responsibilities if you distribute copies of the software, or if
|
||||||
|
you modify it: responsibilities to respect the freedom of others.
|
||||||
|
|
||||||
|
For example, if you distribute copies of such a program, whether
|
||||||
|
gratis or for a fee, you must pass on to the recipients the same
|
||||||
|
freedoms that you received. You must make sure that they, too, receive
|
||||||
|
or can get the source code. And you must show them these terms so they
|
||||||
|
know their rights.
|
||||||
|
|
||||||
|
Developers that use the GNU GPL protect your rights with two steps:
|
||||||
|
(1) assert copyright on the software, and (2) offer you this License
|
||||||
|
giving you legal permission to copy, distribute and/or modify it.
|
||||||
|
|
||||||
|
For the developers' and authors' protection, the GPL clearly explains
|
||||||
|
that there is no warranty for this free software. For both users' and
|
||||||
|
authors' sake, the GPL requires that modified versions be marked as
|
||||||
|
changed, so that their problems will not be attributed erroneously to
|
||||||
|
authors of previous versions.
|
||||||
|
|
||||||
|
Some devices are designed to deny users access to install or run
|
||||||
|
modified versions of the software inside them, although the manufacturer
|
||||||
|
can do so. This is fundamentally incompatible with the aim of
|
||||||
|
protecting users' freedom to change the software. The systematic
|
||||||
|
pattern of such abuse occurs in the area of products for individuals to
|
||||||
|
use, which is precisely where it is most unacceptable. Therefore, we
|
||||||
|
have designed this version of the GPL to prohibit the practice for those
|
||||||
|
products. If such problems arise substantially in other domains, we
|
||||||
|
stand ready to extend this provision to those domains in future versions
|
||||||
|
of the GPL, as needed to protect the freedom of users.
|
||||||
|
|
||||||
|
Finally, every program is threatened constantly by software patents.
|
||||||
|
States should not allow patents to restrict development and use of
|
||||||
|
software on general-purpose computers, but in those that do, we wish to
|
||||||
|
avoid the special danger that patents applied to a free program could
|
||||||
|
make it effectively proprietary. To prevent this, the GPL assures that
|
||||||
|
patents cannot be used to render the program non-free.
|
||||||
|
|
||||||
|
The precise terms and conditions for copying, distribution and
|
||||||
|
modification follow.
|
||||||
|
|
||||||
|
TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
0. Definitions.
|
||||||
|
|
||||||
|
"This License" refers to version 3 of the GNU General Public License.
|
||||||
|
|
||||||
|
"Copyright" also means copyright-like laws that apply to other kinds of
|
||||||
|
works, such as semiconductor masks.
|
||||||
|
|
||||||
|
"The Program" refers to any copyrightable work licensed under this
|
||||||
|
License. Each licensee is addressed as "you". "Licensees" and
|
||||||
|
"recipients" may be individuals or organizations.
|
||||||
|
|
||||||
|
To "modify" a work means to copy from or adapt all or part of the work
|
||||||
|
in a fashion requiring copyright permission, other than the making of an
|
||||||
|
exact copy. The resulting work is called a "modified version" of the
|
||||||
|
earlier work or a work "based on" the earlier work.
|
||||||
|
|
||||||
|
A "covered work" means either the unmodified Program or a work based
|
||||||
|
on the Program.
|
||||||
|
|
||||||
|
To "propagate" a work means to do anything with it that, without
|
||||||
|
permission, would make you directly or secondarily liable for
|
||||||
|
infringement under applicable copyright law, except executing it on a
|
||||||
|
computer or modifying a private copy. Propagation includes copying,
|
||||||
|
distribution (with or without modification), making available to the
|
||||||
|
public, and in some countries other activities as well.
|
||||||
|
|
||||||
|
To "convey" a work means any kind of propagation that enables other
|
||||||
|
parties to make or receive copies. Mere interaction with a user through
|
||||||
|
a computer network, with no transfer of a copy, is not conveying.
|
||||||
|
|
||||||
|
An interactive user interface displays "Appropriate Legal Notices"
|
||||||
|
to the extent that it includes a convenient and prominently visible
|
||||||
|
feature that (1) displays an appropriate copyright notice, and (2)
|
||||||
|
tells the user that there is no warranty for the work (except to the
|
||||||
|
extent that warranties are provided), that licensees may convey the
|
||||||
|
work under this License, and how to view a copy of this License. If
|
||||||
|
the interface presents a list of user commands or options, such as a
|
||||||
|
menu, a prominent item in the list meets this criterion.
|
||||||
|
|
||||||
|
1. Source Code.
|
||||||
|
|
||||||
|
The "source code" for a work means the preferred form of the work
|
||||||
|
for making modifications to it. "Object code" means any non-source
|
||||||
|
form of a work.
|
||||||
|
|
||||||
|
A "Standard Interface" means an interface that either is an official
|
||||||
|
standard defined by a recognized standards body, or, in the case of
|
||||||
|
interfaces specified for a particular programming language, one that
|
||||||
|
is widely used among developers working in that language.
|
||||||
|
|
||||||
|
The "System Libraries" of an executable work include anything, other
|
||||||
|
than the work as a whole, that (a) is included in the normal form of
|
||||||
|
packaging a Major Component, but which is not part of that Major
|
||||||
|
Component, and (b) serves only to enable use of the work with that
|
||||||
|
Major Component, or to implement a Standard Interface for which an
|
||||||
|
implementation is available to the public in source code form. A
|
||||||
|
"Major Component", in this context, means a major essential component
|
||||||
|
(kernel, window system, and so on) of the specific operating system
|
||||||
|
(if any) on which the executable work runs, or a compiler used to
|
||||||
|
produce the work, or an object code interpreter used to run it.
|
||||||
|
|
||||||
|
The "Corresponding Source" for a work in object code form means all
|
||||||
|
the source code needed to generate, install, and (for an executable
|
||||||
|
work) run the object code and to modify the work, including scripts to
|
||||||
|
control those activities. However, it does not include the work's
|
||||||
|
System Libraries, or general-purpose tools or generally available free
|
||||||
|
programs which are used unmodified in performing those activities but
|
||||||
|
which are not part of the work. For example, Corresponding Source
|
||||||
|
includes interface definition files associated with source files for
|
||||||
|
the work, and the source code for shared libraries and dynamically
|
||||||
|
linked subprograms that the work is specifically designed to require,
|
||||||
|
such as by intimate data communication or control flow between those
|
||||||
|
subprograms and other parts of the work.
|
||||||
|
|
||||||
|
The Corresponding Source need not include anything that users
|
||||||
|
can regenerate automatically from other parts of the Corresponding
|
||||||
|
Source.
|
||||||
|
|
||||||
|
The Corresponding Source for a work in source code form is that
|
||||||
|
same work.
|
||||||
|
|
||||||
|
2. Basic Permissions.
|
||||||
|
|
||||||
|
All rights granted under this License are granted for the term of
|
||||||
|
copyright on the Program, and are irrevocable provided the stated
|
||||||
|
conditions are met. This License explicitly affirms your unlimited
|
||||||
|
permission to run the unmodified Program. The output from running a
|
||||||
|
covered work is covered by this License only if the output, given its
|
||||||
|
content, constitutes a covered work. This License acknowledges your
|
||||||
|
rights of fair use or other equivalent, as provided by copyright law.
|
||||||
|
|
||||||
|
You may make, run and propagate covered works that you do not
|
||||||
|
convey, without conditions so long as your license otherwise remains
|
||||||
|
in force. You may convey covered works to others for the sole purpose
|
||||||
|
of having them make modifications exclusively for you, or provide you
|
||||||
|
with facilities for running those works, provided that you comply with
|
||||||
|
the terms of this License in conveying all material for which you do
|
||||||
|
not control copyright. Those thus making or running the covered works
|
||||||
|
for you must do so exclusively on your behalf, under your direction
|
||||||
|
and control, on terms that prohibit them from making any copies of
|
||||||
|
your copyrighted material outside their relationship with you.
|
||||||
|
|
||||||
|
Conveying under any other circumstances is permitted solely under
|
||||||
|
the conditions stated below. Sublicensing is not allowed; section 10
|
||||||
|
makes it unnecessary.
|
||||||
|
|
||||||
|
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
|
||||||
|
|
||||||
|
No covered work shall be deemed part of an effective technological
|
||||||
|
measure under any applicable law fulfilling obligations under article
|
||||||
|
11 of the WIPO copyright treaty adopted on 20 December 1996, or
|
||||||
|
similar laws prohibiting or restricting circumvention of such
|
||||||
|
measures.
|
||||||
|
|
||||||
|
When you convey a covered work, you waive any legal power to forbid
|
||||||
|
circumvention of technological measures to the extent such circumvention
|
||||||
|
is effected by exercising rights under this License with respect to
|
||||||
|
the covered work, and you disclaim any intention to limit operation or
|
||||||
|
modification of the work as a means of enforcing, against the work's
|
||||||
|
users, your or third parties' legal rights to forbid circumvention of
|
||||||
|
technological measures.
|
||||||
|
|
||||||
|
4. Conveying Verbatim Copies.
|
||||||
|
|
||||||
|
You may convey verbatim copies of the Program's source code as you
|
||||||
|
receive it, in any medium, provided that you conspicuously and
|
||||||
|
appropriately publish on each copy an appropriate copyright notice;
|
||||||
|
keep intact all notices stating that this License and any
|
||||||
|
non-permissive terms added in accord with section 7 apply to the code;
|
||||||
|
keep intact all notices of the absence of any warranty; and give all
|
||||||
|
recipients a copy of this License along with the Program.
|
||||||
|
|
||||||
|
You may charge any price or no price for each copy that you convey,
|
||||||
|
and you may offer support or warranty protection for a fee.
|
||||||
|
|
||||||
|
5. Conveying Modified Source Versions.
|
||||||
|
|
||||||
|
You may convey a work based on the Program, or the modifications to
|
||||||
|
produce it from the Program, in the form of source code under the
|
||||||
|
terms of section 4, provided that you also meet all of these conditions:
|
||||||
|
|
||||||
|
a) The work must carry prominent notices stating that you modified
|
||||||
|
it, and giving a relevant date.
|
||||||
|
|
||||||
|
b) The work must carry prominent notices stating that it is
|
||||||
|
released under this License and any conditions added under section
|
||||||
|
7. This requirement modifies the requirement in section 4 to
|
||||||
|
"keep intact all notices".
|
||||||
|
|
||||||
|
c) You must license the entire work, as a whole, under this
|
||||||
|
License to anyone who comes into possession of a copy. This
|
||||||
|
License will therefore apply, along with any applicable section 7
|
||||||
|
additional terms, to the whole of the work, and all its parts,
|
||||||
|
regardless of how they are packaged. This License gives no
|
||||||
|
permission to license the work in any other way, but it does not
|
||||||
|
invalidate such permission if you have separately received it.
|
||||||
|
|
||||||
|
d) If the work has interactive user interfaces, each must display
|
||||||
|
Appropriate Legal Notices; however, if the Program has interactive
|
||||||
|
interfaces that do not display Appropriate Legal Notices, your
|
||||||
|
work need not make them do so.
|
||||||
|
|
||||||
|
A compilation of a covered work with other separate and independent
|
||||||
|
works, which are not by their nature extensions of the covered work,
|
||||||
|
and which are not combined with it such as to form a larger program,
|
||||||
|
in or on a volume of a storage or distribution medium, is called an
|
||||||
|
"aggregate" if the compilation and its resulting copyright are not
|
||||||
|
used to limit the access or legal rights of the compilation's users
|
||||||
|
beyond what the individual works permit. Inclusion of a covered work
|
||||||
|
in an aggregate does not cause this License to apply to the other
|
||||||
|
parts of the aggregate.
|
||||||
|
|
||||||
|
6. Conveying Non-Source Forms.
|
||||||
|
|
||||||
|
You may convey a covered work in object code form under the terms
|
||||||
|
of sections 4 and 5, provided that you also convey the
|
||||||
|
machine-readable Corresponding Source under the terms of this License,
|
||||||
|
in one of these ways:
|
||||||
|
|
||||||
|
a) Convey the object code in, or embodied in, a physical product
|
||||||
|
(including a physical distribution medium), accompanied by the
|
||||||
|
Corresponding Source fixed on a durable physical medium
|
||||||
|
customarily used for software interchange.
|
||||||
|
|
||||||
|
b) Convey the object code in, or embodied in, a physical product
|
||||||
|
(including a physical distribution medium), accompanied by a
|
||||||
|
written offer, valid for at least three years and valid for as
|
||||||
|
long as you offer spare parts or customer support for that product
|
||||||
|
model, to give anyone who possesses the object code either (1) a
|
||||||
|
copy of the Corresponding Source for all the software in the
|
||||||
|
product that is covered by this License, on a durable physical
|
||||||
|
medium customarily used for software interchange, for a price no
|
||||||
|
more than your reasonable cost of physically performing this
|
||||||
|
conveying of source, or (2) access to copy the
|
||||||
|
Corresponding Source from a network server at no charge.
|
||||||
|
|
||||||
|
c) Convey individual copies of the object code with a copy of the
|
||||||
|
written offer to provide the Corresponding Source. This
|
||||||
|
alternative is allowed only occasionally and noncommercially, and
|
||||||
|
only if you received the object code with such an offer, in accord
|
||||||
|
with subsection 6b.
|
||||||
|
|
||||||
|
d) Convey the object code by offering access from a designated
|
||||||
|
place (gratis or for a charge), and offer equivalent access to the
|
||||||
|
Corresponding Source in the same way through the same place at no
|
||||||
|
further charge. You need not require recipients to copy the
|
||||||
|
Corresponding Source along with the object code. If the place to
|
||||||
|
copy the object code is a network server, the Corresponding Source
|
||||||
|
may be on a different server (operated by you or a third party)
|
||||||
|
that supports equivalent copying facilities, provided you maintain
|
||||||
|
clear directions next to the object code saying where to find the
|
||||||
|
Corresponding Source. Regardless of what server hosts the
|
||||||
|
Corresponding Source, you remain obligated to ensure that it is
|
||||||
|
available for as long as needed to satisfy these requirements.
|
||||||
|
|
||||||
|
e) Convey the object code using peer-to-peer transmission, provided
|
||||||
|
you inform other peers where the object code and Corresponding
|
||||||
|
Source of the work are being offered to the general public at no
|
||||||
|
charge under subsection 6d.
|
||||||
|
|
||||||
|
A separable portion of the object code, whose source code is excluded
|
||||||
|
from the Corresponding Source as a System Library, need not be
|
||||||
|
included in conveying the object code work.
|
||||||
|
|
||||||
|
A "User Product" is either (1) a "consumer product", which means any
|
||||||
|
tangible personal property which is normally used for personal, family,
|
||||||
|
or household purposes, or (2) anything designed or sold for incorporation
|
||||||
|
into a dwelling. In determining whether a product is a consumer product,
|
||||||
|
doubtful cases shall be resolved in favor of coverage. For a particular
|
||||||
|
product received by a particular user, "normally used" refers to a
|
||||||
|
typical or common use of that class of product, regardless of the status
|
||||||
|
of the particular user or of the way in which the particular user
|
||||||
|
actually uses, or expects or is expected to use, the product. A product
|
||||||
|
is a consumer product regardless of whether the product has substantial
|
||||||
|
commercial, industrial or non-consumer uses, unless such uses represent
|
||||||
|
the only significant mode of use of the product.
|
||||||
|
|
||||||
|
"Installation Information" for a User Product means any methods,
|
||||||
|
procedures, authorization keys, or other information required to install
|
||||||
|
and execute modified versions of a covered work in that User Product from
|
||||||
|
a modified version of its Corresponding Source. The information must
|
||||||
|
suffice to ensure that the continued functioning of the modified object
|
||||||
|
code is in no case prevented or interfered with solely because
|
||||||
|
modification has been made.
|
||||||
|
|
||||||
|
If you convey an object code work under this section in, or with, or
|
||||||
|
specifically for use in, a User Product, and the conveying occurs as
|
||||||
|
part of a transaction in which the right of possession and use of the
|
||||||
|
User Product is transferred to the recipient in perpetuity or for a
|
||||||
|
fixed term (regardless of how the transaction is characterized), the
|
||||||
|
Corresponding Source conveyed under this section must be accompanied
|
||||||
|
by the Installation Information. But this requirement does not apply
|
||||||
|
if neither you nor any third party retains the ability to install
|
||||||
|
modified object code on the User Product (for example, the work has
|
||||||
|
been installed in ROM).
|
||||||
|
|
||||||
|
The requirement to provide Installation Information does not include a
|
||||||
|
requirement to continue to provide support service, warranty, or updates
|
||||||
|
for a work that has been modified or installed by the recipient, or for
|
||||||
|
the User Product in which it has been modified or installed. Access to a
|
||||||
|
network may be denied when the modification itself materially and
|
||||||
|
adversely affects the operation of the network or violates the rules and
|
||||||
|
protocols for communication across the network.
|
||||||
|
|
||||||
|
Corresponding Source conveyed, and Installation Information provided,
|
||||||
|
in accord with this section must be in a format that is publicly
|
||||||
|
documented (and with an implementation available to the public in
|
||||||
|
source code form), and must require no special password or key for
|
||||||
|
unpacking, reading or copying.
|
||||||
|
|
||||||
|
7. Additional Terms.
|
||||||
|
|
||||||
|
"Additional permissions" are terms that supplement the terms of this
|
||||||
|
License by making exceptions from one or more of its conditions.
|
||||||
|
Additional permissions that are applicable to the entire Program shall
|
||||||
|
be treated as though they were included in this License, to the extent
|
||||||
|
that they are valid under applicable law. If additional permissions
|
||||||
|
apply only to part of the Program, that part may be used separately
|
||||||
|
under those permissions, but the entire Program remains governed by
|
||||||
|
this License without regard to the additional permissions.
|
||||||
|
|
||||||
|
When you convey a copy of a covered work, you may at your option
|
||||||
|
remove any additional permissions from that copy, or from any part of
|
||||||
|
it. (Additional permissions may be written to require their own
|
||||||
|
removal in certain cases when you modify the work.) You may place
|
||||||
|
additional permissions on material, added by you to a covered work,
|
||||||
|
for which you have or can give appropriate copyright permission.
|
||||||
|
|
||||||
|
Notwithstanding any other provision of this License, for material you
|
||||||
|
add to a covered work, you may (if authorized by the copyright holders of
|
||||||
|
that material) supplement the terms of this License with terms:
|
||||||
|
|
||||||
|
a) Disclaiming warranty or limiting liability differently from the
|
||||||
|
terms of sections 15 and 16 of this License; or
|
||||||
|
|
||||||
|
b) Requiring preservation of specified reasonable legal notices or
|
||||||
|
author attributions in that material or in the Appropriate Legal
|
||||||
|
Notices displayed by works containing it; or
|
||||||
|
|
||||||
|
c) Prohibiting misrepresentation of the origin of that material, or
|
||||||
|
requiring that modified versions of such material be marked in
|
||||||
|
reasonable ways as different from the original version; or
|
||||||
|
|
||||||
|
d) Limiting the use for publicity purposes of names of licensors or
|
||||||
|
authors of the material; or
|
||||||
|
|
||||||
|
e) Declining to grant rights under trademark law for use of some
|
||||||
|
trade names, trademarks, or service marks; or
|
||||||
|
|
||||||
|
f) Requiring indemnification of licensors and authors of that
|
||||||
|
material by anyone who conveys the material (or modified versions of
|
||||||
|
it) with contractual assumptions of liability to the recipient, for
|
||||||
|
any liability that these contractual assumptions directly impose on
|
||||||
|
those licensors and authors.
|
||||||
|
|
||||||
|
All other non-permissive additional terms are considered "further
|
||||||
|
restrictions" within the meaning of section 10. If the Program as you
|
||||||
|
received it, or any part of it, contains a notice stating that it is
|
||||||
|
governed by this License along with a term that is a further
|
||||||
|
restriction, you may remove that term. If a license document contains
|
||||||
|
a further restriction but permits relicensing or conveying under this
|
||||||
|
License, you may add to a covered work material governed by the terms
|
||||||
|
of that license document, provided that the further restriction does
|
||||||
|
not survive such relicensing or conveying.
|
||||||
|
|
||||||
|
If you add terms to a covered work in accord with this section, you
|
||||||
|
must place, in the relevant source files, a statement of the
|
||||||
|
additional terms that apply to those files, or a notice indicating
|
||||||
|
where to find the applicable terms.
|
||||||
|
|
||||||
|
Additional terms, permissive or non-permissive, may be stated in the
|
||||||
|
form of a separately written license, or stated as exceptions;
|
||||||
|
the above requirements apply either way.
|
||||||
|
|
||||||
|
8. Termination.
|
||||||
|
|
||||||
|
You may not propagate or modify a covered work except as expressly
|
||||||
|
provided under this License. Any attempt otherwise to propagate or
|
||||||
|
modify it is void, and will automatically terminate your rights under
|
||||||
|
this License (including any patent licenses granted under the third
|
||||||
|
paragraph of section 11).
|
||||||
|
|
||||||
|
However, if you cease all violation of this License, then your
|
||||||
|
license from a particular copyright holder is reinstated (a)
|
||||||
|
provisionally, unless and until the copyright holder explicitly and
|
||||||
|
finally terminates your license, and (b) permanently, if the copyright
|
||||||
|
holder fails to notify you of the violation by some reasonable means
|
||||||
|
prior to 60 days after the cessation.
|
||||||
|
|
||||||
|
Moreover, your license from a particular copyright holder is
|
||||||
|
reinstated permanently if the copyright holder notifies you of the
|
||||||
|
violation by some reasonable means, this is the first time you have
|
||||||
|
received notice of violation of this License (for any work) from that
|
||||||
|
copyright holder, and you cure the violation prior to 30 days after
|
||||||
|
your receipt of the notice.
|
||||||
|
|
||||||
|
Termination of your rights under this section does not terminate the
|
||||||
|
licenses of parties who have received copies or rights from you under
|
||||||
|
this License. If your rights have been terminated and not permanently
|
||||||
|
reinstated, you do not qualify to receive new licenses for the same
|
||||||
|
material under section 10.
|
||||||
|
|
||||||
|
9. Acceptance Not Required for Having Copies.
|
||||||
|
|
||||||
|
You are not required to accept this License in order to receive or
|
||||||
|
run a copy of the Program. Ancillary propagation of a covered work
|
||||||
|
occurring solely as a consequence of using peer-to-peer transmission
|
||||||
|
to receive a copy likewise does not require acceptance. However,
|
||||||
|
nothing other than this License grants you permission to propagate or
|
||||||
|
modify any covered work. These actions infringe copyright if you do
|
||||||
|
not accept this License. Therefore, by modifying or propagating a
|
||||||
|
covered work, you indicate your acceptance of this License to do so.
|
||||||
|
|
||||||
|
10. Automatic Licensing of Downstream Recipients.
|
||||||
|
|
||||||
|
Each time you convey a covered work, the recipient automatically
|
||||||
|
receives a license from the original licensors, to run, modify and
|
||||||
|
propagate that work, subject to this License. You are not responsible
|
||||||
|
for enforcing compliance by third parties with this License.
|
||||||
|
|
||||||
|
An "entity transaction" is a transaction transferring control of an
|
||||||
|
organization, or substantially all assets of one, or subdividing an
|
||||||
|
organization, or merging organizations. If propagation of a covered
|
||||||
|
work results from an entity transaction, each party to that
|
||||||
|
transaction who receives a copy of the work also receives whatever
|
||||||
|
licenses to the work the party's predecessor in interest had or could
|
||||||
|
give under the previous paragraph, plus a right to possession of the
|
||||||
|
Corresponding Source of the work from the predecessor in interest, if
|
||||||
|
the predecessor has it or can get it with reasonable efforts.
|
||||||
|
|
||||||
|
You may not impose any further restrictions on the exercise of the
|
||||||
|
rights granted or affirmed under this License. For example, you may
|
||||||
|
not impose a license fee, royalty, or other charge for exercise of
|
||||||
|
rights granted under this License, and you may not initiate litigation
|
||||||
|
(including a cross-claim or counterclaim in a lawsuit) alleging that
|
||||||
|
any patent claim is infringed by making, using, selling, offering for
|
||||||
|
sale, or importing the Program or any portion of it.
|
||||||
|
|
||||||
|
11. Patents.
|
||||||
|
|
||||||
|
A "contributor" is a copyright holder who authorizes use under this
|
||||||
|
License of the Program or a work on which the Program is based. The
|
||||||
|
work thus licensed is called the contributor's "contributor version".
|
||||||
|
|
||||||
|
A contributor's "essential patent claims" are all patent claims
|
||||||
|
owned or controlled by the contributor, whether already acquired or
|
||||||
|
hereafter acquired, that would be infringed by some manner, permitted
|
||||||
|
by this License, of making, using, or selling its contributor version,
|
||||||
|
but do not include claims that would be infringed only as a
|
||||||
|
consequence of further modification of the contributor version. For
|
||||||
|
purposes of this definition, "control" includes the right to grant
|
||||||
|
patent sublicenses in a manner consistent with the requirements of
|
||||||
|
this License.
|
||||||
|
|
||||||
|
Each contributor grants you a non-exclusive, worldwide, royalty-free
|
||||||
|
patent license under the contributor's essential patent claims, to
|
||||||
|
make, use, sell, offer for sale, import and otherwise run, modify and
|
||||||
|
propagate the contents of its contributor version.
|
||||||
|
|
||||||
|
In the following three paragraphs, a "patent license" is any express
|
||||||
|
agreement or commitment, however denominated, not to enforce a patent
|
||||||
|
(such as an express permission to practice a patent or covenant not to
|
||||||
|
sue for patent infringement). To "grant" such a patent license to a
|
||||||
|
party means to make such an agreement or commitment not to enforce a
|
||||||
|
patent against the party.
|
||||||
|
|
||||||
|
If you convey a covered work, knowingly relying on a patent license,
|
||||||
|
and the Corresponding Source of the work is not available for anyone
|
||||||
|
to copy, free of charge and under the terms of this License, through a
|
||||||
|
publicly available network server or other readily accessible means,
|
||||||
|
then you must either (1) cause the Corresponding Source to be so
|
||||||
|
available, or (2) arrange to deprive yourself of the benefit of the
|
||||||
|
patent license for this particular work, or (3) arrange, in a manner
|
||||||
|
consistent with the requirements of this License, to extend the patent
|
||||||
|
license to downstream recipients. "Knowingly relying" means you have
|
||||||
|
actual knowledge that, but for the patent license, your conveying the
|
||||||
|
covered work in a country, or your recipient's use of the covered work
|
||||||
|
in a country, would infringe one or more identifiable patents in that
|
||||||
|
country that you have reason to believe are valid.
|
||||||
|
|
||||||
|
If, pursuant to or in connection with a single transaction or
|
||||||
|
arrangement, you convey, or propagate by procuring conveyance of, a
|
||||||
|
covered work, and grant a patent license to some of the parties
|
||||||
|
receiving the covered work authorizing them to use, propagate, modify
|
||||||
|
or convey a specific copy of the covered work, then the patent license
|
||||||
|
you grant is automatically extended to all recipients of the covered
|
||||||
|
work and works based on it.
|
||||||
|
|
||||||
|
A patent license is "discriminatory" if it does not include within
|
||||||
|
the scope of its coverage, prohibits the exercise of, or is
|
||||||
|
conditioned on the non-exercise of one or more of the rights that are
|
||||||
|
specifically granted under this License. You may not convey a covered
|
||||||
|
work if you are a party to an arrangement with a third party that is
|
||||||
|
in the business of distributing software, under which you make payment
|
||||||
|
to the third party based on the extent of your activity of conveying
|
||||||
|
the work, and under which the third party grants, to any of the
|
||||||
|
parties who would receive the covered work from you, a discriminatory
|
||||||
|
patent license (a) in connection with copies of the covered work
|
||||||
|
conveyed by you (or copies made from those copies), or (b) primarily
|
||||||
|
for and in connection with specific products or compilations that
|
||||||
|
contain the covered work, unless you entered into that arrangement,
|
||||||
|
or that patent license was granted, prior to 28 March 2007.
|
||||||
|
|
||||||
|
Nothing in this License shall be construed as excluding or limiting
|
||||||
|
any implied license or other defenses to infringement that may
|
||||||
|
otherwise be available to you under applicable patent law.
|
||||||
|
|
||||||
|
12. No Surrender of Others' Freedom.
|
||||||
|
|
||||||
|
If conditions are imposed on you (whether by court order, agreement or
|
||||||
|
otherwise) that contradict the conditions of this License, they do not
|
||||||
|
excuse you from the conditions of this License. If you cannot convey a
|
||||||
|
covered work so as to satisfy simultaneously your obligations under this
|
||||||
|
License and any other pertinent obligations, then as a consequence you may
|
||||||
|
not convey it at all. For example, if you agree to terms that obligate you
|
||||||
|
to collect a royalty for further conveying from those to whom you convey
|
||||||
|
the Program, the only way you could satisfy both those terms and this
|
||||||
|
License would be to refrain entirely from conveying the Program.
|
||||||
|
|
||||||
|
13. Use with the GNU Affero General Public License.
|
||||||
|
|
||||||
|
Notwithstanding any other provision of this License, you have
|
||||||
|
permission to link or combine any covered work with a work licensed
|
||||||
|
under version 3 of the GNU Affero General Public License into a single
|
||||||
|
combined work, and to convey the resulting work. The terms of this
|
||||||
|
License will continue to apply to the part which is the covered work,
|
||||||
|
but the special requirements of the GNU Affero General Public License,
|
||||||
|
section 13, concerning interaction through a network will apply to the
|
||||||
|
combination as such.
|
||||||
|
|
||||||
|
14. Revised Versions of this License.
|
||||||
|
|
||||||
|
The Free Software Foundation may publish revised and/or new versions of
|
||||||
|
the GNU General Public License from time to time. Such new versions will
|
||||||
|
be similar in spirit to the present version, but may differ in detail to
|
||||||
|
address new problems or concerns.
|
||||||
|
|
||||||
|
Each version is given a distinguishing version number. If the
|
||||||
|
Program specifies that a certain numbered version of the GNU General
|
||||||
|
Public License "or any later version" applies to it, you have the
|
||||||
|
option of following the terms and conditions either of that numbered
|
||||||
|
version or of any later version published by the Free Software
|
||||||
|
Foundation. If the Program does not specify a version number of the
|
||||||
|
GNU General Public License, you may choose any version ever published
|
||||||
|
by the Free Software Foundation.
|
||||||
|
|
||||||
|
If the Program specifies that a proxy can decide which future
|
||||||
|
versions of the GNU General Public License can be used, that proxy's
|
||||||
|
public statement of acceptance of a version permanently authorizes you
|
||||||
|
to choose that version for the Program.
|
||||||
|
|
||||||
|
Later license versions may give you additional or different
|
||||||
|
permissions. However, no additional obligations are imposed on any
|
||||||
|
author or copyright holder as a result of your choosing to follow a
|
||||||
|
later version.
|
||||||
|
|
||||||
|
15. Disclaimer of Warranty.
|
||||||
|
|
||||||
|
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
|
||||||
|
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
||||||
|
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
|
||||||
|
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||||
|
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
|
||||||
|
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
|
||||||
|
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
|
||||||
|
|
||||||
|
16. Limitation of Liability.
|
||||||
|
|
||||||
|
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||||
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
|
||||||
|
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
|
||||||
|
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
|
||||||
|
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
||||||
|
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
|
||||||
|
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
|
||||||
|
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
|
||||||
|
SUCH DAMAGES.
|
||||||
|
|
||||||
|
17. Interpretation of Sections 15 and 16.
|
||||||
|
|
||||||
|
If the disclaimer of warranty and limitation of liability provided
|
||||||
|
above cannot be given local legal effect according to their terms,
|
||||||
|
reviewing courts shall apply local law that most closely approximates
|
||||||
|
an absolute waiver of all civil liability in connection with the
|
||||||
|
Program, unless a warranty or assumption of liability accompanies a
|
||||||
|
copy of the Program in return for a fee.
|
||||||
|
|
||||||
|
END OF TERMS AND CONDITIONS
|
||||||
|
|
||||||
|
How to Apply These Terms to Your New Programs
|
||||||
|
|
||||||
|
If you develop a new program, and you want it to be of the greatest
|
||||||
|
possible use to the public, the best way to achieve this is to make it
|
||||||
|
free software which everyone can redistribute and change under these terms.
|
||||||
|
|
||||||
|
To do so, attach the following notices to the program. It is safest
|
||||||
|
to attach them to the start of each source file to most effectively
|
||||||
|
state the exclusion of warranty; and each file should have at least
|
||||||
|
the "copyright" line and a pointer to where the full notice is found.
|
||||||
|
|
||||||
|
<one line to give the program's name and a brief idea of what it does.>
|
||||||
|
Copyright (C) <year> <name of author>
|
||||||
|
|
||||||
|
This program is free software: you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation, either version 3 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
This program is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with this program. If not, see <https://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
Also add information on how to contact you by electronic and paper mail.
|
||||||
|
|
||||||
|
If the program does terminal interaction, make it output a short
|
||||||
|
notice like this when it starts in an interactive mode:
|
||||||
|
|
||||||
|
<program> Copyright (C) <year> <name of author>
|
||||||
|
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||||
|
This is free software, and you are welcome to redistribute it
|
||||||
|
under certain conditions; type `show c' for details.
|
||||||
|
|
||||||
|
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||||
|
parts of the General Public License. Of course, your program's commands
|
||||||
|
might be different; for a GUI interface, you would use an "about box".
|
||||||
|
|
||||||
|
You should also get your employer (if you work as a programmer) or school,
|
||||||
|
if any, to sign a "copyright disclaimer" for the program, if necessary.
|
||||||
|
For more information on this, and how to apply and follow the GNU GPL, see
|
||||||
|
<https://www.gnu.org/licenses/>.
|
||||||
|
|
||||||
|
The GNU General Public License does not permit incorporating your program
|
||||||
|
into proprietary programs. If your program is a subroutine library, you
|
||||||
|
may consider it more useful to permit linking proprietary applications with
|
||||||
|
the library. If this is what you want to do, use the GNU Lesser General
|
||||||
|
Public License instead of this License. But first, please read
|
||||||
|
<https://www.gnu.org/philosophy/why-not-lgpl.html>.
|
||||||
26
README.rst
Normal file
26
README.rst
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
===============
|
||||||
|
Ebook converter
|
||||||
|
===============
|
||||||
|
|
||||||
|
This is impudent ripoff of the bits from `Calibre project`_, and is aimed only
|
||||||
|
for converter thing.
|
||||||
|
|
||||||
|
My motivation is to have only converter for ebooks run from commandline,
|
||||||
|
without all of those bells and whistles Calibre have, and with cleanest more
|
||||||
|
*pythonic* approach.
|
||||||
|
|
||||||
|
|
||||||
|
Installation
|
||||||
|
------------
|
||||||
|
|
||||||
|
TBD.
|
||||||
|
|
||||||
|
|
||||||
|
License
|
||||||
|
-------
|
||||||
|
|
||||||
|
This work is licensed on GPL3 license, like the original work. See LICENSE file
|
||||||
|
for details.
|
||||||
|
|
||||||
|
|
||||||
|
.. _Calibre project: https://calibre-ebook.com/
|
||||||
681
ebook_converter/__init__.py
Normal file
681
ebook_converter/__init__.py
Normal file
@@ -0,0 +1,681 @@
|
|||||||
|
from __future__ import unicode_literals, print_function
|
||||||
|
''' E-book management software'''
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import sys, os, re, time, random, warnings
|
||||||
|
from polyglot.builtins import codepoint_to_chr, unicode_type, range, hasenv, native_string_type
|
||||||
|
from math import floor
|
||||||
|
from functools import partial
|
||||||
|
|
||||||
|
if not hasenv('CALIBRE_SHOW_DEPRECATION_WARNINGS'):
|
||||||
|
warnings.simplefilter('ignore', DeprecationWarning)
|
||||||
|
try:
|
||||||
|
os.getcwd()
|
||||||
|
except EnvironmentError:
|
||||||
|
os.chdir(os.path.expanduser('~'))
|
||||||
|
|
||||||
|
from calibre.constants import (iswindows, isosx, islinux, isfrozen,
|
||||||
|
isbsd, preferred_encoding, __appname__, __version__, __author__,
|
||||||
|
win32event, win32api, winerror, fcntl, ispy3,
|
||||||
|
filesystem_encoding, plugins, config_dir)
|
||||||
|
from calibre.startup import winutil, winutilerror
|
||||||
|
from calibre.utils.icu import safe_chr
|
||||||
|
|
||||||
|
if False:
|
||||||
|
# Prevent pyflakes from complaining
|
||||||
|
winutil, winutilerror, __appname__, islinux, __version__
|
||||||
|
fcntl, win32event, isfrozen, __author__
|
||||||
|
winerror, win32api, isbsd, config_dir
|
||||||
|
|
||||||
|
_mt_inited = False
|
||||||
|
|
||||||
|
|
||||||
|
def _init_mimetypes():
|
||||||
|
global _mt_inited
|
||||||
|
import mimetypes
|
||||||
|
mimetypes.init([P('mime.types')])
|
||||||
|
_mt_inited = True
|
||||||
|
|
||||||
|
|
||||||
|
def guess_type(*args, **kwargs):
|
||||||
|
import mimetypes
|
||||||
|
if not _mt_inited:
|
||||||
|
_init_mimetypes()
|
||||||
|
return mimetypes.guess_type(*args, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def guess_all_extensions(*args, **kwargs):
|
||||||
|
import mimetypes
|
||||||
|
if not _mt_inited:
|
||||||
|
_init_mimetypes()
|
||||||
|
return mimetypes.guess_all_extensions(*args, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def guess_extension(*args, **kwargs):
|
||||||
|
import mimetypes
|
||||||
|
if not _mt_inited:
|
||||||
|
_init_mimetypes()
|
||||||
|
ext = mimetypes.guess_extension(*args, **kwargs)
|
||||||
|
if not ext and args and args[0] == 'application/x-palmreader':
|
||||||
|
ext = '.pdb'
|
||||||
|
return ext
|
||||||
|
|
||||||
|
|
||||||
|
def get_types_map():
|
||||||
|
import mimetypes
|
||||||
|
if not _mt_inited:
|
||||||
|
_init_mimetypes()
|
||||||
|
return mimetypes.types_map
|
||||||
|
|
||||||
|
|
||||||
|
def to_unicode(raw, encoding='utf-8', errors='strict'):
|
||||||
|
if isinstance(raw, unicode_type):
|
||||||
|
return raw
|
||||||
|
return raw.decode(encoding, errors)
|
||||||
|
|
||||||
|
|
||||||
|
def patheq(p1, p2):
|
||||||
|
p = os.path
|
||||||
|
d = lambda x : p.normcase(p.normpath(p.realpath(p.normpath(x))))
|
||||||
|
if not p1 or not p2:
|
||||||
|
return False
|
||||||
|
return d(p1) == d(p2)
|
||||||
|
|
||||||
|
|
||||||
|
def unicode_path(path, abs=False):
|
||||||
|
if isinstance(path, bytes):
|
||||||
|
path = path.decode(filesystem_encoding)
|
||||||
|
if abs:
|
||||||
|
path = os.path.abspath(path)
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
def osx_version():
|
||||||
|
if isosx:
|
||||||
|
import platform
|
||||||
|
src = platform.mac_ver()[0]
|
||||||
|
m = re.match(r'(\d+)\.(\d+)\.(\d+)', src)
|
||||||
|
if m:
|
||||||
|
return int(m.group(1)), int(m.group(2)), int(m.group(3))
|
||||||
|
|
||||||
|
|
||||||
|
def confirm_config_name(name):
|
||||||
|
return name + '_again'
|
||||||
|
|
||||||
|
|
||||||
|
_filename_sanitize_unicode = frozenset(('\\', '|', '?', '*', '<', # no2to3
|
||||||
|
'"', ':', '>', '+', '/') + tuple(map(codepoint_to_chr, range(32)))) # no2to3
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_file_name(name, substitute='_'):
|
||||||
|
'''
|
||||||
|
Sanitize the filename `name`. All invalid characters are replaced by `substitute`.
|
||||||
|
The set of invalid characters is the union of the invalid characters in Windows,
|
||||||
|
macOS and Linux. Also removes leading and trailing whitespace.
|
||||||
|
**WARNING:** This function also replaces path separators, so only pass file names
|
||||||
|
and not full paths to it.
|
||||||
|
'''
|
||||||
|
if isbytestring(name):
|
||||||
|
name = name.decode(filesystem_encoding, 'replace')
|
||||||
|
if isbytestring(substitute):
|
||||||
|
substitute = substitute.decode(filesystem_encoding, 'replace')
|
||||||
|
chars = (substitute if c in _filename_sanitize_unicode else c for c in name)
|
||||||
|
one = ''.join(chars)
|
||||||
|
one = re.sub(r'\s', ' ', one).strip()
|
||||||
|
bname, ext = os.path.splitext(one)
|
||||||
|
one = re.sub(r'^\.+$', '_', bname)
|
||||||
|
one = one.replace('..', substitute)
|
||||||
|
one += ext
|
||||||
|
# Windows doesn't like path components that end with a period or space
|
||||||
|
if one and one[-1] in ('.', ' '):
|
||||||
|
one = one[:-1]+'_'
|
||||||
|
# Names starting with a period are hidden on Unix
|
||||||
|
if one.startswith('.'):
|
||||||
|
one = '_' + one[1:]
|
||||||
|
return one
|
||||||
|
|
||||||
|
|
||||||
|
sanitize_file_name2 = sanitize_file_name_unicode = sanitize_file_name
|
||||||
|
|
||||||
|
|
||||||
|
def prints(*args, **kwargs):
|
||||||
|
'''
|
||||||
|
Print unicode arguments safely by encoding them to preferred_encoding
|
||||||
|
Has the same signature as the print function from Python 3, except for the
|
||||||
|
additional keyword argument safe_encode, which if set to True will cause the
|
||||||
|
function to use repr when encoding fails.
|
||||||
|
|
||||||
|
Returns the number of bytes written.
|
||||||
|
'''
|
||||||
|
file = kwargs.get('file', sys.stdout)
|
||||||
|
file = getattr(file, 'buffer', file)
|
||||||
|
enc = 'utf-8' if hasenv('CALIBRE_WORKER') else preferred_encoding
|
||||||
|
sep = kwargs.get('sep', ' ')
|
||||||
|
if not isinstance(sep, bytes):
|
||||||
|
sep = sep.encode(enc)
|
||||||
|
end = kwargs.get('end', '\n')
|
||||||
|
if not isinstance(end, bytes):
|
||||||
|
end = end.encode(enc)
|
||||||
|
safe_encode = kwargs.get('safe_encode', False)
|
||||||
|
count = 0
|
||||||
|
for i, arg in enumerate(args):
|
||||||
|
if isinstance(arg, unicode_type):
|
||||||
|
if iswindows:
|
||||||
|
from calibre.utils.terminal import Detect
|
||||||
|
cs = Detect(file)
|
||||||
|
if cs.is_console:
|
||||||
|
cs.write_unicode_text(arg)
|
||||||
|
count += len(arg)
|
||||||
|
if i != len(args)-1:
|
||||||
|
file.write(sep)
|
||||||
|
count += len(sep)
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
arg = arg.encode(enc)
|
||||||
|
except UnicodeEncodeError:
|
||||||
|
try:
|
||||||
|
arg = arg.encode('utf-8')
|
||||||
|
except:
|
||||||
|
if not safe_encode:
|
||||||
|
raise
|
||||||
|
arg = repr(arg)
|
||||||
|
if not isinstance(arg, bytes):
|
||||||
|
try:
|
||||||
|
arg = native_string_type(arg)
|
||||||
|
except ValueError:
|
||||||
|
arg = unicode_type(arg)
|
||||||
|
if isinstance(arg, unicode_type):
|
||||||
|
try:
|
||||||
|
arg = arg.encode(enc)
|
||||||
|
except UnicodeEncodeError:
|
||||||
|
try:
|
||||||
|
arg = arg.encode('utf-8')
|
||||||
|
except:
|
||||||
|
if not safe_encode:
|
||||||
|
raise
|
||||||
|
arg = repr(arg)
|
||||||
|
|
||||||
|
try:
|
||||||
|
file.write(arg)
|
||||||
|
count += len(arg)
|
||||||
|
except:
|
||||||
|
from polyglot import reprlib
|
||||||
|
arg = reprlib.repr(arg)
|
||||||
|
file.write(arg)
|
||||||
|
count += len(arg)
|
||||||
|
if i != len(args)-1:
|
||||||
|
file.write(sep)
|
||||||
|
count += len(sep)
|
||||||
|
file.write(end)
|
||||||
|
count += len(end)
|
||||||
|
return count
|
||||||
|
|
||||||
|
|
||||||
|
class CommandLineError(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def setup_cli_handlers(logger, level):
|
||||||
|
import logging
|
||||||
|
if hasenv('CALIBRE_WORKER') and logger.handlers:
|
||||||
|
return
|
||||||
|
logger.setLevel(level)
|
||||||
|
if level == logging.WARNING:
|
||||||
|
handler = logging.StreamHandler(sys.stdout)
|
||||||
|
handler.setFormatter(logging.Formatter('%(levelname)s: %(message)s'))
|
||||||
|
handler.setLevel(logging.WARNING)
|
||||||
|
elif level == logging.INFO:
|
||||||
|
handler = logging.StreamHandler(sys.stdout)
|
||||||
|
handler.setFormatter(logging.Formatter())
|
||||||
|
handler.setLevel(logging.INFO)
|
||||||
|
elif level == logging.DEBUG:
|
||||||
|
handler = logging.StreamHandler(sys.stderr)
|
||||||
|
handler.setLevel(logging.DEBUG)
|
||||||
|
handler.setFormatter(logging.Formatter('[%(levelname)s] %(filename)s:%(lineno)s: %(message)s'))
|
||||||
|
|
||||||
|
logger.addHandler(handler)
|
||||||
|
|
||||||
|
|
||||||
|
def load_library(name, cdll):
|
||||||
|
if iswindows:
|
||||||
|
return cdll.LoadLibrary(name)
|
||||||
|
if isosx:
|
||||||
|
name += '.dylib'
|
||||||
|
if hasattr(sys, 'frameworks_dir'):
|
||||||
|
return cdll.LoadLibrary(os.path.join(getattr(sys, 'frameworks_dir'), name))
|
||||||
|
return cdll.LoadLibrary(name)
|
||||||
|
return cdll.LoadLibrary(name+'.so')
|
||||||
|
|
||||||
|
|
||||||
|
def extract(path, dir):
|
||||||
|
extractor = None
|
||||||
|
# First use the file header to identify its type
|
||||||
|
with open(path, 'rb') as f:
|
||||||
|
id_ = f.read(3)
|
||||||
|
if id_ == b'Rar':
|
||||||
|
from calibre.utils.unrar import extract as rarextract
|
||||||
|
extractor = rarextract
|
||||||
|
elif id_.startswith(b'PK'):
|
||||||
|
from calibre.libunzip import extract as zipextract
|
||||||
|
extractor = zipextract
|
||||||
|
if extractor is None:
|
||||||
|
# Fallback to file extension
|
||||||
|
ext = os.path.splitext(path)[1][1:].lower()
|
||||||
|
if ext in ['zip', 'cbz', 'epub', 'oebzip']:
|
||||||
|
from calibre.libunzip import extract as zipextract
|
||||||
|
extractor = zipextract
|
||||||
|
elif ext in ['cbr', 'rar']:
|
||||||
|
from calibre.utils.unrar import extract as rarextract
|
||||||
|
extractor = rarextract
|
||||||
|
if extractor is None:
|
||||||
|
raise Exception('Unknown archive type')
|
||||||
|
extractor(path, dir)
|
||||||
|
|
||||||
|
|
||||||
|
def get_proxies(debug=True):
|
||||||
|
from polyglot.urllib import getproxies
|
||||||
|
proxies = getproxies()
|
||||||
|
for key, proxy in list(proxies.items()):
|
||||||
|
if not proxy or '..' in proxy or key == 'auto':
|
||||||
|
del proxies[key]
|
||||||
|
continue
|
||||||
|
if proxy.startswith(key+'://'):
|
||||||
|
proxy = proxy[len(key)+3:]
|
||||||
|
if key == 'https' and proxy.startswith('http://'):
|
||||||
|
proxy = proxy[7:]
|
||||||
|
if proxy.endswith('/'):
|
||||||
|
proxy = proxy[:-1]
|
||||||
|
if len(proxy) > 4:
|
||||||
|
proxies[key] = proxy
|
||||||
|
else:
|
||||||
|
prints('Removing invalid', key, 'proxy:', proxy)
|
||||||
|
del proxies[key]
|
||||||
|
|
||||||
|
if proxies and debug:
|
||||||
|
prints('Using proxies:', proxies)
|
||||||
|
return proxies
|
||||||
|
|
||||||
|
|
||||||
|
def get_parsed_proxy(typ='http', debug=True):
|
||||||
|
proxies = get_proxies(debug)
|
||||||
|
proxy = proxies.get(typ, None)
|
||||||
|
if proxy:
|
||||||
|
pattern = re.compile((
|
||||||
|
'(?:ptype://)?'
|
||||||
|
'(?:(?P<user>\\w+):(?P<pass>.*)@)?'
|
||||||
|
'(?P<host>[\\w\\-\\.]+)'
|
||||||
|
'(?::(?P<port>\\d+))?').replace('ptype', typ)
|
||||||
|
)
|
||||||
|
|
||||||
|
match = pattern.match(proxies[typ])
|
||||||
|
if match:
|
||||||
|
try:
|
||||||
|
ans = {
|
||||||
|
'host' : match.group('host'),
|
||||||
|
'port' : match.group('port'),
|
||||||
|
'user' : match.group('user'),
|
||||||
|
'pass' : match.group('pass')
|
||||||
|
}
|
||||||
|
if ans['port']:
|
||||||
|
ans['port'] = int(ans['port'])
|
||||||
|
except:
|
||||||
|
if debug:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
else:
|
||||||
|
if debug:
|
||||||
|
prints('Using http proxy', unicode_type(ans))
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def get_proxy_info(proxy_scheme, proxy_string):
|
||||||
|
'''
|
||||||
|
Parse all proxy information from a proxy string (as returned by
|
||||||
|
get_proxies). The returned dict will have members set to None when the info
|
||||||
|
is not available in the string. If an exception occurs parsing the string
|
||||||
|
this method returns None.
|
||||||
|
'''
|
||||||
|
from polyglot.urllib import urlparse
|
||||||
|
try:
|
||||||
|
proxy_url = '%s://%s'%(proxy_scheme, proxy_string)
|
||||||
|
urlinfo = urlparse(proxy_url)
|
||||||
|
ans = {
|
||||||
|
'scheme': urlinfo.scheme,
|
||||||
|
'hostname': urlinfo.hostname,
|
||||||
|
'port': urlinfo.port,
|
||||||
|
'username': urlinfo.username,
|
||||||
|
'password': urlinfo.password,
|
||||||
|
}
|
||||||
|
except Exception:
|
||||||
|
return None
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
# IE 11 on windows 7
|
||||||
|
USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko'
|
||||||
|
USER_AGENT_MOBILE = 'Mozilla/5.0 (Windows; U; Windows CE 5.1; rv:1.8.1a3) Gecko/20060610 Minimo/0.016'
|
||||||
|
|
||||||
|
|
||||||
|
def is_mobile_ua(ua):
|
||||||
|
return 'Mobile/' in ua or 'Mobile ' in ua
|
||||||
|
|
||||||
|
|
||||||
|
def random_user_agent(choose=None, allow_ie=True):
|
||||||
|
from calibre.utils.random_ua import common_user_agents
|
||||||
|
ua_list = common_user_agents()
|
||||||
|
ua_list = [x for x in ua_list if not is_mobile_ua(x)]
|
||||||
|
if not allow_ie:
|
||||||
|
ua_list = [x for x in ua_list if 'Trident/' not in x and 'Edge/' not in x]
|
||||||
|
return random.choice(ua_list) if choose is None else ua_list[choose]
|
||||||
|
|
||||||
|
|
||||||
|
def browser(honor_time=True, max_time=2, mobile_browser=False, user_agent=None, verify_ssl_certificates=True, handle_refresh=True):
|
||||||
|
'''
|
||||||
|
Create a mechanize browser for web scraping. The browser handles cookies,
|
||||||
|
refresh requests and ignores robots.txt. Also uses proxy if available.
|
||||||
|
|
||||||
|
:param honor_time: If True honors pause time in refresh requests
|
||||||
|
:param max_time: Maximum time in seconds to wait during a refresh request
|
||||||
|
:param verify_ssl_certificates: If false SSL certificates errors are ignored
|
||||||
|
'''
|
||||||
|
from calibre.utils.browser import Browser
|
||||||
|
opener = Browser(verify_ssl=verify_ssl_certificates)
|
||||||
|
opener.set_handle_refresh(handle_refresh, max_time=max_time, honor_time=honor_time)
|
||||||
|
opener.set_handle_robots(False)
|
||||||
|
if user_agent is None:
|
||||||
|
user_agent = USER_AGENT_MOBILE if mobile_browser else USER_AGENT
|
||||||
|
opener.addheaders = [('User-agent', user_agent)]
|
||||||
|
proxies = get_proxies()
|
||||||
|
to_add = {}
|
||||||
|
http_proxy = proxies.get('http', None)
|
||||||
|
if http_proxy:
|
||||||
|
to_add['http'] = http_proxy
|
||||||
|
https_proxy = proxies.get('https', None)
|
||||||
|
if https_proxy:
|
||||||
|
to_add['https'] = https_proxy
|
||||||
|
if to_add:
|
||||||
|
opener.set_proxies(to_add)
|
||||||
|
|
||||||
|
return opener
|
||||||
|
|
||||||
|
|
||||||
|
def fit_image(width, height, pwidth, pheight):
|
||||||
|
'''
|
||||||
|
Fit image in box of width pwidth and height pheight.
|
||||||
|
@param width: Width of image
|
||||||
|
@param height: Height of image
|
||||||
|
@param pwidth: Width of box
|
||||||
|
@param pheight: Height of box
|
||||||
|
@return: scaled, new_width, new_height. scaled is True iff new_width and/or new_height is different from width or height.
|
||||||
|
'''
|
||||||
|
scaled = height > pheight or width > pwidth
|
||||||
|
if height > pheight:
|
||||||
|
corrf = pheight / float(height)
|
||||||
|
width, height = floor(corrf*width), pheight
|
||||||
|
if width > pwidth:
|
||||||
|
corrf = pwidth / float(width)
|
||||||
|
width, height = pwidth, floor(corrf*height)
|
||||||
|
if height > pheight:
|
||||||
|
corrf = pheight / float(height)
|
||||||
|
width, height = floor(corrf*width), pheight
|
||||||
|
|
||||||
|
return scaled, int(width), int(height)
|
||||||
|
|
||||||
|
|
||||||
|
class CurrentDir(object):
|
||||||
|
|
||||||
|
def __init__(self, path):
|
||||||
|
self.path = path
|
||||||
|
self.cwd = None
|
||||||
|
|
||||||
|
def __enter__(self, *args):
|
||||||
|
self.cwd = os.getcwd()
|
||||||
|
os.chdir(self.path)
|
||||||
|
return self.cwd
|
||||||
|
|
||||||
|
def __exit__(self, *args):
|
||||||
|
try:
|
||||||
|
os.chdir(self.cwd)
|
||||||
|
except EnvironmentError:
|
||||||
|
# The previous CWD no longer exists
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
_ncpus = None
|
||||||
|
|
||||||
|
|
||||||
|
if ispy3:
|
||||||
|
def detect_ncpus():
|
||||||
|
global _ncpus
|
||||||
|
if _ncpus is None:
|
||||||
|
_ncpus = max(1, os.cpu_count() or 1)
|
||||||
|
return _ncpus
|
||||||
|
else:
|
||||||
|
def detect_ncpus():
|
||||||
|
"""Detects the number of effective CPUs in the system"""
|
||||||
|
global _ncpus
|
||||||
|
if _ncpus is None:
|
||||||
|
if iswindows:
|
||||||
|
import win32api
|
||||||
|
ans = win32api.GetSystemInfo()[5]
|
||||||
|
else:
|
||||||
|
import multiprocessing
|
||||||
|
ans = -1
|
||||||
|
try:
|
||||||
|
ans = multiprocessing.cpu_count()
|
||||||
|
except Exception:
|
||||||
|
from PyQt5.Qt import QThread
|
||||||
|
ans = QThread.idealThreadCount()
|
||||||
|
_ncpus = max(1, ans)
|
||||||
|
return _ncpus
|
||||||
|
|
||||||
|
|
||||||
|
relpath = os.path.relpath
|
||||||
|
|
||||||
|
|
||||||
|
def walk(dir):
|
||||||
|
''' A nice interface to os.walk '''
|
||||||
|
for record in os.walk(dir):
|
||||||
|
for f in record[-1]:
|
||||||
|
yield os.path.join(record[0], f)
|
||||||
|
|
||||||
|
|
||||||
|
def strftime(fmt, t=None):
|
||||||
|
''' A version of strftime that returns unicode strings and tries to handle dates
|
||||||
|
before 1900 '''
|
||||||
|
if not fmt:
|
||||||
|
return ''
|
||||||
|
if t is None:
|
||||||
|
t = time.localtime()
|
||||||
|
if hasattr(t, 'timetuple'):
|
||||||
|
t = t.timetuple()
|
||||||
|
early_year = t[0] < 1900
|
||||||
|
if early_year:
|
||||||
|
replacement = 1900 if t[0]%4 == 0 else 1901
|
||||||
|
fmt = fmt.replace('%Y', '_early year hack##')
|
||||||
|
t = list(t)
|
||||||
|
orig_year = t[0]
|
||||||
|
t[0] = replacement
|
||||||
|
t = time.struct_time(t)
|
||||||
|
ans = None
|
||||||
|
if iswindows:
|
||||||
|
if isinstance(fmt, bytes):
|
||||||
|
fmt = fmt.decode('mbcs', 'replace')
|
||||||
|
fmt = fmt.replace('%e', '%#d')
|
||||||
|
ans = plugins['winutil'][0].strftime(fmt, t)
|
||||||
|
else:
|
||||||
|
ans = time.strftime(fmt, t)
|
||||||
|
if isinstance(ans, bytes):
|
||||||
|
ans = ans.decode(preferred_encoding, 'replace')
|
||||||
|
if early_year:
|
||||||
|
ans = ans.replace('_early year hack##', unicode_type(orig_year))
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def my_unichr(num):
|
||||||
|
try:
|
||||||
|
return safe_chr(num)
|
||||||
|
except (ValueError, OverflowError):
|
||||||
|
return '?'
|
||||||
|
|
||||||
|
|
||||||
|
def entity_to_unicode(match, exceptions=[], encoding='cp1252',
|
||||||
|
result_exceptions={}):
|
||||||
|
'''
|
||||||
|
:param match: A match object such that '&'+match.group(1)';' is the entity.
|
||||||
|
|
||||||
|
:param exceptions: A list of entities to not convert (Each entry is the name of the entity, for e.g. 'apos' or '#1234'
|
||||||
|
|
||||||
|
:param encoding: The encoding to use to decode numeric entities between 128 and 256.
|
||||||
|
If None, the Unicode UCS encoding is used. A common encoding is cp1252.
|
||||||
|
|
||||||
|
:param result_exceptions: A mapping of characters to entities. If the result
|
||||||
|
is in result_exceptions, result_exception[result] is returned instead.
|
||||||
|
Convenient way to specify exception for things like < or > that can be
|
||||||
|
specified by various actual entities.
|
||||||
|
'''
|
||||||
|
def check(ch):
|
||||||
|
return result_exceptions.get(ch, ch)
|
||||||
|
|
||||||
|
ent = match.group(1)
|
||||||
|
if ent in exceptions:
|
||||||
|
return '&'+ent+';'
|
||||||
|
if ent in {'apos', 'squot'}: # squot is generated by some broken CMS software
|
||||||
|
return check("'")
|
||||||
|
if ent == 'hellips':
|
||||||
|
ent = 'hellip'
|
||||||
|
if ent.startswith('#'):
|
||||||
|
try:
|
||||||
|
if ent[1] in ('x', 'X'):
|
||||||
|
num = int(ent[2:], 16)
|
||||||
|
else:
|
||||||
|
num = int(ent[1:])
|
||||||
|
except:
|
||||||
|
return '&'+ent+';'
|
||||||
|
if encoding is None or num > 255:
|
||||||
|
return check(my_unichr(num))
|
||||||
|
try:
|
||||||
|
return check(bytes(bytearray((num,))).decode(encoding))
|
||||||
|
except UnicodeDecodeError:
|
||||||
|
return check(my_unichr(num))
|
||||||
|
from calibre.ebooks.html_entities import html5_entities
|
||||||
|
try:
|
||||||
|
return check(html5_entities[ent])
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
from polyglot.html_entities import name2codepoint
|
||||||
|
try:
|
||||||
|
return check(my_unichr(name2codepoint[ent]))
|
||||||
|
except KeyError:
|
||||||
|
return '&'+ent+';'
|
||||||
|
|
||||||
|
|
||||||
|
_ent_pat = re.compile(r'&(\S+?);')
|
||||||
|
xml_entity_to_unicode = partial(entity_to_unicode, result_exceptions={
|
||||||
|
'"' : '"',
|
||||||
|
"'" : ''',
|
||||||
|
'<' : '<',
|
||||||
|
'>' : '>',
|
||||||
|
'&' : '&'})
|
||||||
|
|
||||||
|
|
||||||
|
def replace_entities(raw, encoding='cp1252'):
|
||||||
|
return _ent_pat.sub(partial(entity_to_unicode, encoding=encoding), raw)
|
||||||
|
|
||||||
|
|
||||||
|
def xml_replace_entities(raw, encoding='cp1252'):
|
||||||
|
return _ent_pat.sub(partial(xml_entity_to_unicode, encoding=encoding), raw)
|
||||||
|
|
||||||
|
|
||||||
|
def prepare_string_for_xml(raw, attribute=False):
|
||||||
|
raw = _ent_pat.sub(entity_to_unicode, raw)
|
||||||
|
raw = raw.replace('&', '&').replace('<', '<').replace('>', '>')
|
||||||
|
if attribute:
|
||||||
|
raw = raw.replace('"', '"').replace("'", ''')
|
||||||
|
return raw
|
||||||
|
|
||||||
|
|
||||||
|
def isbytestring(obj):
|
||||||
|
return isinstance(obj, bytes)
|
||||||
|
|
||||||
|
|
||||||
|
def force_unicode(obj, enc=preferred_encoding):
|
||||||
|
if isbytestring(obj):
|
||||||
|
try:
|
||||||
|
obj = obj.decode(enc)
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
obj = obj.decode(filesystem_encoding if enc ==
|
||||||
|
preferred_encoding else preferred_encoding)
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
obj = obj.decode('utf-8')
|
||||||
|
except Exception:
|
||||||
|
obj = repr(obj)
|
||||||
|
if isbytestring(obj):
|
||||||
|
obj = obj.decode('utf-8')
|
||||||
|
return obj
|
||||||
|
|
||||||
|
|
||||||
|
def as_unicode(obj, enc=preferred_encoding):
|
||||||
|
if not isbytestring(obj):
|
||||||
|
try:
|
||||||
|
obj = unicode_type(obj)
|
||||||
|
except Exception:
|
||||||
|
try:
|
||||||
|
obj = native_string_type(obj)
|
||||||
|
except Exception:
|
||||||
|
obj = repr(obj)
|
||||||
|
return force_unicode(obj, enc=enc)
|
||||||
|
|
||||||
|
|
||||||
|
def url_slash_cleaner(url):
|
||||||
|
'''
|
||||||
|
Removes redundant /'s from url's.
|
||||||
|
'''
|
||||||
|
return re.sub(r'(?<!:)/{2,}', '/', url)
|
||||||
|
|
||||||
|
|
||||||
|
def human_readable(size, sep=' '):
|
||||||
|
""" Convert a size in bytes into a human readable form """
|
||||||
|
divisor, suffix = 1, "B"
|
||||||
|
for i, candidate in enumerate(('B', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB')):
|
||||||
|
if size < (1 << ((i + 1) * 10)):
|
||||||
|
divisor, suffix = (1 << (i * 10)), candidate
|
||||||
|
break
|
||||||
|
size = unicode_type(float(size)/divisor)
|
||||||
|
if size.find(".") > -1:
|
||||||
|
size = size[:size.find(".")+2]
|
||||||
|
if size.endswith('.0'):
|
||||||
|
size = size[:-2]
|
||||||
|
return size + sep + suffix
|
||||||
|
|
||||||
|
|
||||||
|
def ipython(user_ns=None):
|
||||||
|
from calibre.utils.ipython import ipython
|
||||||
|
ipython(user_ns=user_ns)
|
||||||
|
|
||||||
|
|
||||||
|
def fsync(fileobj):
|
||||||
|
fileobj.flush()
|
||||||
|
os.fsync(fileobj.fileno())
|
||||||
|
if islinux and getattr(fileobj, 'name', None):
|
||||||
|
# On Linux kernels after 5.1.9 and 4.19.50 using fsync without any
|
||||||
|
# following activity causes Kindles to eject. Instead of fixing this in
|
||||||
|
# the obvious way, which is to have the kernel send some harmless
|
||||||
|
# filesystem activity after the FSYNC, the kernel developers seem to
|
||||||
|
# think the correct solution is to disable FSYNC using a mount flag
|
||||||
|
# which users will have to turn on manually. So instead we create some
|
||||||
|
# harmless filesystem activity, and who cares about performance.
|
||||||
|
# See https://bugs.launchpad.net/calibre/+bug/1834641
|
||||||
|
# and https://bugzilla.kernel.org/show_bug.cgi?id=203973
|
||||||
|
# To check for the existence of the bug, simply run:
|
||||||
|
# python -c "p = '/run/media/kovid/Kindle/driveinfo.calibre'; f = open(p, 'r+b'); os.fsync(f.fileno());"
|
||||||
|
# this will cause the Kindle to disconnect.
|
||||||
|
try:
|
||||||
|
os.utime(fileobj.name, None)
|
||||||
|
except Exception:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
343
ebook_converter/constants.py
Normal file
343
ebook_converter/constants.py
Normal file
@@ -0,0 +1,343 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
# License: GPLv3 Copyright: 2015, Kovid Goyal <kovid at kovidgoyal.net>
|
||||||
|
from __future__ import print_function, unicode_literals
|
||||||
|
from polyglot.builtins import map, unicode_type, environ_item, hasenv, getenv, as_unicode, native_string_type
|
||||||
|
import sys, locale, codecs, os, importlib, collections
|
||||||
|
|
||||||
|
__appname__ = 'calibre'
|
||||||
|
numeric_version = (4, 12, 0)
|
||||||
|
__version__ = '.'.join(map(unicode_type, numeric_version))
|
||||||
|
git_version = None
|
||||||
|
__author__ = "Kovid Goyal <kovid@kovidgoyal.net>"
|
||||||
|
|
||||||
|
'''
|
||||||
|
Various run time constants.
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
_plat = sys.platform.lower()
|
||||||
|
iswindows = 'win32' in _plat or 'win64' in _plat
|
||||||
|
isosx = 'darwin' in _plat
|
||||||
|
isnewosx = isosx and getattr(sys, 'new_app_bundle', False)
|
||||||
|
isfreebsd = 'freebsd' in _plat
|
||||||
|
isnetbsd = 'netbsd' in _plat
|
||||||
|
isdragonflybsd = 'dragonfly' in _plat
|
||||||
|
isbsd = isfreebsd or isnetbsd or isdragonflybsd
|
||||||
|
ishaiku = 'haiku1' in _plat
|
||||||
|
islinux = not(iswindows or isosx or isbsd or ishaiku)
|
||||||
|
isfrozen = hasattr(sys, 'frozen')
|
||||||
|
isunix = isosx or islinux or ishaiku
|
||||||
|
isportable = hasenv('CALIBRE_PORTABLE_BUILD')
|
||||||
|
ispy3 = sys.version_info.major > 2
|
||||||
|
isxp = isoldvista = False
|
||||||
|
if iswindows:
|
||||||
|
wver = sys.getwindowsversion()
|
||||||
|
isxp = wver.major < 6
|
||||||
|
isoldvista = wver.build < 6002
|
||||||
|
is64bit = sys.maxsize > (1 << 32)
|
||||||
|
isworker = hasenv('CALIBRE_WORKER') or hasenv('CALIBRE_SIMPLE_WORKER')
|
||||||
|
if isworker:
|
||||||
|
os.environ.pop(environ_item('CALIBRE_FORCE_ANSI'), None)
|
||||||
|
FAKE_PROTOCOL, FAKE_HOST = 'clbr', 'internal.invalid'
|
||||||
|
VIEWER_APP_UID = 'com.calibre-ebook.viewer'
|
||||||
|
EDITOR_APP_UID = 'com.calibre-ebook.edit-book'
|
||||||
|
MAIN_APP_UID = 'com.calibre-ebook.main-gui'
|
||||||
|
STORE_DIALOG_APP_UID = 'com.calibre-ebook.store-dialog'
|
||||||
|
TOC_DIALOG_APP_UID = 'com.calibre-ebook.toc-editor'
|
||||||
|
try:
|
||||||
|
preferred_encoding = locale.getpreferredencoding()
|
||||||
|
codecs.lookup(preferred_encoding)
|
||||||
|
except:
|
||||||
|
preferred_encoding = 'utf-8'
|
||||||
|
|
||||||
|
win32event = importlib.import_module('win32event') if iswindows else None
|
||||||
|
winerror = importlib.import_module('winerror') if iswindows else None
|
||||||
|
win32api = importlib.import_module('win32api') if iswindows else None
|
||||||
|
fcntl = None if iswindows else importlib.import_module('fcntl')
|
||||||
|
dark_link_color = '#6cb4ee'
|
||||||
|
|
||||||
|
_osx_ver = None
|
||||||
|
|
||||||
|
|
||||||
|
def get_osx_version():
|
||||||
|
global _osx_ver
|
||||||
|
if _osx_ver is None:
|
||||||
|
import platform
|
||||||
|
from collections import namedtuple
|
||||||
|
OSX = namedtuple('OSX', 'major minor tertiary')
|
||||||
|
try:
|
||||||
|
ver = platform.mac_ver()[0].split('.')
|
||||||
|
if len(ver) == 2:
|
||||||
|
ver.append(0)
|
||||||
|
_osx_ver = OSX(*map(int, ver)) # no2to3
|
||||||
|
except Exception:
|
||||||
|
_osx_ver = OSX(0, 0, 0)
|
||||||
|
return _osx_ver
|
||||||
|
|
||||||
|
|
||||||
|
filesystem_encoding = sys.getfilesystemencoding()
|
||||||
|
if filesystem_encoding is None:
|
||||||
|
filesystem_encoding = 'utf-8'
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
if codecs.lookup(filesystem_encoding).name == 'ascii':
|
||||||
|
filesystem_encoding = 'utf-8'
|
||||||
|
# On linux, unicode arguments to os file functions are coerced to an ascii
|
||||||
|
# bytestring if sys.getfilesystemencoding() == 'ascii', which is
|
||||||
|
# just plain dumb. This is fixed by the icu.py module which, when
|
||||||
|
# imported changes ascii to utf-8
|
||||||
|
except Exception:
|
||||||
|
filesystem_encoding = 'utf-8'
|
||||||
|
|
||||||
|
|
||||||
|
DEBUG = hasenv('CALIBRE_DEBUG')
|
||||||
|
|
||||||
|
|
||||||
|
def debug():
|
||||||
|
global DEBUG
|
||||||
|
DEBUG = True
|
||||||
|
|
||||||
|
|
||||||
|
def _get_cache_dir():
|
||||||
|
import errno
|
||||||
|
confcache = os.path.join(config_dir, 'caches')
|
||||||
|
try:
|
||||||
|
os.makedirs(confcache)
|
||||||
|
except EnvironmentError as err:
|
||||||
|
if err.errno != errno.EEXIST:
|
||||||
|
raise
|
||||||
|
if isportable:
|
||||||
|
return confcache
|
||||||
|
ccd = getenv('CALIBRE_CACHE_DIRECTORY')
|
||||||
|
if ccd is not None:
|
||||||
|
ans = os.path.abspath(ccd)
|
||||||
|
try:
|
||||||
|
os.makedirs(ans)
|
||||||
|
return ans
|
||||||
|
except EnvironmentError as err:
|
||||||
|
if err.errno == errno.EEXIST:
|
||||||
|
return ans
|
||||||
|
|
||||||
|
if iswindows:
|
||||||
|
w = plugins['winutil'][0]
|
||||||
|
try:
|
||||||
|
candidate = os.path.join(w.special_folder_path(w.CSIDL_LOCAL_APPDATA), '%s-cache'%__appname__)
|
||||||
|
except ValueError:
|
||||||
|
return confcache
|
||||||
|
elif isosx:
|
||||||
|
candidate = os.path.join(os.path.expanduser('~/Library/Caches'), __appname__)
|
||||||
|
else:
|
||||||
|
candidate = getenv('XDG_CACHE_HOME', '~/.cache')
|
||||||
|
candidate = os.path.join(os.path.expanduser(candidate),
|
||||||
|
__appname__)
|
||||||
|
if isinstance(candidate, bytes):
|
||||||
|
try:
|
||||||
|
candidate = candidate.decode(filesystem_encoding)
|
||||||
|
except ValueError:
|
||||||
|
candidate = confcache
|
||||||
|
try:
|
||||||
|
os.makedirs(candidate)
|
||||||
|
except EnvironmentError as err:
|
||||||
|
if err.errno != errno.EEXIST:
|
||||||
|
candidate = confcache
|
||||||
|
return candidate
|
||||||
|
|
||||||
|
|
||||||
|
def cache_dir():
|
||||||
|
ans = getattr(cache_dir, 'ans', None)
|
||||||
|
if ans is None:
|
||||||
|
ans = cache_dir.ans = os.path.realpath(_get_cache_dir())
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
plugins_loc = sys.extensions_location
|
||||||
|
if ispy3:
|
||||||
|
plugins_loc = os.path.join(plugins_loc, '3')
|
||||||
|
|
||||||
|
|
||||||
|
# plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
class Plugins(collections.Mapping):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._plugins = {}
|
||||||
|
plugins = [
|
||||||
|
'pictureflow',
|
||||||
|
'lzx',
|
||||||
|
'msdes',
|
||||||
|
'podofo',
|
||||||
|
'cPalmdoc',
|
||||||
|
'progress_indicator',
|
||||||
|
'chmlib',
|
||||||
|
'icu',
|
||||||
|
'speedup',
|
||||||
|
'html_as_json',
|
||||||
|
'unicode_names',
|
||||||
|
'html_syntax_highlighter',
|
||||||
|
'hyphen',
|
||||||
|
'freetype',
|
||||||
|
'imageops',
|
||||||
|
'hunspell',
|
||||||
|
'_patiencediff_c',
|
||||||
|
'bzzdec',
|
||||||
|
'matcher',
|
||||||
|
'tokenizer',
|
||||||
|
'certgen',
|
||||||
|
'lzma_binding',
|
||||||
|
]
|
||||||
|
if not ispy3:
|
||||||
|
plugins.extend([
|
||||||
|
'monotonic',
|
||||||
|
'zlib2',
|
||||||
|
])
|
||||||
|
if iswindows:
|
||||||
|
plugins.extend(['winutil', 'wpd', 'winfonts'])
|
||||||
|
if isosx:
|
||||||
|
plugins.append('usbobserver')
|
||||||
|
plugins.append('cocoa')
|
||||||
|
if isfreebsd or ishaiku or islinux or isosx:
|
||||||
|
plugins.append('libusb')
|
||||||
|
plugins.append('libmtp')
|
||||||
|
self.plugins = frozenset(plugins)
|
||||||
|
|
||||||
|
def load_plugin(self, name):
|
||||||
|
if name in self._plugins:
|
||||||
|
return
|
||||||
|
sys.path.insert(0, plugins_loc)
|
||||||
|
try:
|
||||||
|
del sys.modules[name]
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
plugin_err = ''
|
||||||
|
try:
|
||||||
|
p = importlib.import_module(name)
|
||||||
|
except Exception as err:
|
||||||
|
p = None
|
||||||
|
try:
|
||||||
|
plugin_err = unicode_type(err)
|
||||||
|
except Exception:
|
||||||
|
plugin_err = as_unicode(native_string_type(err), encoding=preferred_encoding, errors='replace')
|
||||||
|
self._plugins[name] = p, plugin_err
|
||||||
|
sys.path.remove(plugins_loc)
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
return iter(self.plugins)
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return len(self.plugins)
|
||||||
|
|
||||||
|
def __contains__(self, name):
|
||||||
|
return name in self.plugins
|
||||||
|
|
||||||
|
def __getitem__(self, name):
|
||||||
|
if name not in self.plugins:
|
||||||
|
raise KeyError('No plugin named %r'%name)
|
||||||
|
self.load_plugin(name)
|
||||||
|
return self._plugins[name]
|
||||||
|
|
||||||
|
|
||||||
|
plugins = None
|
||||||
|
if plugins is None:
|
||||||
|
plugins = Plugins()
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# config_dir {{{
|
||||||
|
|
||||||
|
CONFIG_DIR_MODE = 0o700
|
||||||
|
|
||||||
|
cconfd = getenv('CALIBRE_CONFIG_DIRECTORY')
|
||||||
|
if cconfd is not None:
|
||||||
|
config_dir = os.path.abspath(cconfd)
|
||||||
|
elif iswindows:
|
||||||
|
if plugins['winutil'][0] is None:
|
||||||
|
raise Exception(plugins['winutil'][1])
|
||||||
|
try:
|
||||||
|
config_dir = plugins['winutil'][0].special_folder_path(plugins['winutil'][0].CSIDL_APPDATA)
|
||||||
|
except ValueError:
|
||||||
|
config_dir = None
|
||||||
|
if not config_dir or not os.access(config_dir, os.W_OK|os.X_OK):
|
||||||
|
config_dir = os.path.expanduser('~')
|
||||||
|
config_dir = os.path.join(config_dir, 'calibre')
|
||||||
|
elif isosx:
|
||||||
|
config_dir = os.path.expanduser('~/Library/Preferences/calibre')
|
||||||
|
else:
|
||||||
|
bdir = os.path.abspath(os.path.expanduser(getenv('XDG_CONFIG_HOME', '~/.config')))
|
||||||
|
config_dir = os.path.join(bdir, 'calibre')
|
||||||
|
try:
|
||||||
|
os.makedirs(config_dir, mode=CONFIG_DIR_MODE)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
if not os.path.exists(config_dir) or \
|
||||||
|
not os.access(config_dir, os.W_OK) or not \
|
||||||
|
os.access(config_dir, os.X_OK):
|
||||||
|
print('No write acces to', config_dir, 'using a temporary dir instead')
|
||||||
|
import tempfile, atexit
|
||||||
|
config_dir = tempfile.mkdtemp(prefix='calibre-config-')
|
||||||
|
|
||||||
|
def cleanup_cdir():
|
||||||
|
try:
|
||||||
|
import shutil
|
||||||
|
shutil.rmtree(config_dir)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
atexit.register(cleanup_cdir)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
dv = getenv('CALIBRE_DEVELOP_FROM')
|
||||||
|
is_running_from_develop = bool(getattr(sys, 'frozen', False) and dv and os.path.abspath(dv) in sys.path)
|
||||||
|
del dv
|
||||||
|
|
||||||
|
|
||||||
|
def get_version():
|
||||||
|
'''Return version string for display to user '''
|
||||||
|
if git_version is not None:
|
||||||
|
v = git_version
|
||||||
|
else:
|
||||||
|
v = __version__
|
||||||
|
if numeric_version[-1] == 0:
|
||||||
|
v = v[:-2]
|
||||||
|
if is_running_from_develop:
|
||||||
|
v += '*'
|
||||||
|
if iswindows and is64bit:
|
||||||
|
v += ' [64bit]'
|
||||||
|
|
||||||
|
return v
|
||||||
|
|
||||||
|
|
||||||
|
def get_portable_base():
|
||||||
|
'Return path to the directory that contains calibre-portable.exe or None'
|
||||||
|
if isportable:
|
||||||
|
return os.path.dirname(os.path.dirname(getenv('CALIBRE_PORTABLE_BUILD')))
|
||||||
|
|
||||||
|
|
||||||
|
def get_windows_username():
|
||||||
|
'''
|
||||||
|
Return the user name of the currently logged in user as a unicode string.
|
||||||
|
Note that usernames on windows are case insensitive, the case of the value
|
||||||
|
returned depends on what the user typed into the login box at login time.
|
||||||
|
'''
|
||||||
|
username = plugins['winutil'][0].username
|
||||||
|
return username()
|
||||||
|
|
||||||
|
|
||||||
|
def get_windows_temp_path():
|
||||||
|
temp_path = plugins['winutil'][0].temp_path
|
||||||
|
return temp_path()
|
||||||
|
|
||||||
|
|
||||||
|
def get_windows_user_locale_name():
|
||||||
|
locale_name = plugins['winutil'][0].locale_name
|
||||||
|
return locale_name()
|
||||||
|
|
||||||
|
|
||||||
|
def get_windows_number_formats():
|
||||||
|
ans = getattr(get_windows_number_formats, 'ans', None)
|
||||||
|
if ans is None:
|
||||||
|
localeconv = plugins['winutil'][0].localeconv
|
||||||
|
d = localeconv()
|
||||||
|
thousands_sep, decimal_point = d['thousands_sep'], d['decimal_point']
|
||||||
|
ans = get_windows_number_formats.ans = thousands_sep, decimal_point
|
||||||
|
return ans
|
||||||
12
ebook_converter/css_selectors/__init__.py
Normal file
12
ebook_converter/css_selectors/__init__.py
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2015, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from css_selectors.parser import parse
|
||||||
|
from css_selectors.select import Select, INAPPROPRIATE_PSEUDO_CLASSES
|
||||||
|
from css_selectors.errors import SelectorError, SelectorSyntaxError, ExpressionError
|
||||||
|
|
||||||
|
__all__ = ['parse', 'Select', 'INAPPROPRIATE_PSEUDO_CLASSES', 'SelectorError', 'SelectorSyntaxError', 'ExpressionError']
|
||||||
18
ebook_converter/css_selectors/errors.py
Normal file
18
ebook_converter/css_selectors/errors.py
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2015, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
class SelectorError(ValueError):
|
||||||
|
|
||||||
|
"""Common parent for SelectorSyntaxError and ExpressionError"""
|
||||||
|
|
||||||
|
class SelectorSyntaxError(SelectorError):
|
||||||
|
|
||||||
|
"""Parsing a selector that does not match the grammar."""
|
||||||
|
|
||||||
|
class ExpressionError(SelectorError):
|
||||||
|
|
||||||
|
"""Unknown or unsupported selector (eg. pseudo-class)."""
|
||||||
133
ebook_converter/css_selectors/ordered_set.py
Normal file
133
ebook_converter/css_selectors/ordered_set.py
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2015, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import collections
|
||||||
|
from polyglot.builtins import string_or_bytes
|
||||||
|
|
||||||
|
SLICE_ALL = slice(None)
|
||||||
|
|
||||||
|
|
||||||
|
def is_iterable(obj):
|
||||||
|
"""
|
||||||
|
Are we being asked to look up a list of things, instead of a single thing?
|
||||||
|
We check for the `__iter__` attribute so that this can cover types that
|
||||||
|
don't have to be known by this module, such as NumPy arrays.
|
||||||
|
|
||||||
|
Strings, however, should be considered as atomic values to look up, not
|
||||||
|
iterables.
|
||||||
|
"""
|
||||||
|
return hasattr(obj, '__iter__') and not isinstance(obj, string_or_bytes)
|
||||||
|
|
||||||
|
|
||||||
|
class OrderedSet(collections.MutableSet):
|
||||||
|
"""
|
||||||
|
An OrderedSet is a custom MutableSet that remembers its order, so that
|
||||||
|
every entry has an index that can be looked up.
|
||||||
|
"""
|
||||||
|
def __init__(self, iterable=None):
|
||||||
|
self.items = []
|
||||||
|
self.map = {}
|
||||||
|
if iterable is not None:
|
||||||
|
for item in iterable:
|
||||||
|
idx = self.map.get(item)
|
||||||
|
if idx is None:
|
||||||
|
self.map[item] = len(self.items)
|
||||||
|
self.items.append(item)
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return len(self.items)
|
||||||
|
|
||||||
|
def __getitem__(self, index):
|
||||||
|
"""
|
||||||
|
Get the item at a given index.
|
||||||
|
|
||||||
|
If `index` is a slice, you will get back that slice of items. If it's
|
||||||
|
the slice [:], exactly the same object is returned. (If you want an
|
||||||
|
independent copy of an OrderedSet, use `OrderedSet.copy()`.)
|
||||||
|
|
||||||
|
If `index` is an iterable, you'll get the OrderedSet of items
|
||||||
|
corresponding to those indices. This is similar to NumPy's
|
||||||
|
"fancy indexing".
|
||||||
|
"""
|
||||||
|
if index == SLICE_ALL:
|
||||||
|
return self
|
||||||
|
elif hasattr(index, '__index__') or isinstance(index, slice):
|
||||||
|
result = self.items[index]
|
||||||
|
if isinstance(result, list):
|
||||||
|
return OrderedSet(result)
|
||||||
|
else:
|
||||||
|
return result
|
||||||
|
elif is_iterable(index):
|
||||||
|
return OrderedSet([self.items[i] for i in index])
|
||||||
|
else:
|
||||||
|
raise TypeError("Don't know how to index an OrderedSet by %r" %
|
||||||
|
index)
|
||||||
|
|
||||||
|
def copy(self):
|
||||||
|
return OrderedSet(self)
|
||||||
|
|
||||||
|
def __getstate__(self):
|
||||||
|
return tuple(self)
|
||||||
|
|
||||||
|
def __setstate__(self, state):
|
||||||
|
self.__init__(state)
|
||||||
|
|
||||||
|
def __contains__(self, key):
|
||||||
|
return key in self.map
|
||||||
|
|
||||||
|
def add(self, key):
|
||||||
|
"""
|
||||||
|
Add `key` as an item to this OrderedSet, then return its index.
|
||||||
|
|
||||||
|
If `key` is already in the OrderedSet, return the index it already
|
||||||
|
had.
|
||||||
|
"""
|
||||||
|
index = self.map.get(key)
|
||||||
|
if index is None:
|
||||||
|
self.map[key] = index = len(self.items)
|
||||||
|
self.items.append(key)
|
||||||
|
return index
|
||||||
|
|
||||||
|
def index(self, key):
|
||||||
|
"""
|
||||||
|
Get the index of a given entry, raising an IndexError if it's not
|
||||||
|
present.
|
||||||
|
|
||||||
|
`key` can be an iterable of entries that is not a string, in which case
|
||||||
|
this returns a list of indices.
|
||||||
|
"""
|
||||||
|
if is_iterable(key):
|
||||||
|
return [self.index(subkey) for subkey in key]
|
||||||
|
return self.map[key]
|
||||||
|
|
||||||
|
def discard(self, key):
|
||||||
|
index = self.map.get(key)
|
||||||
|
if index is not None:
|
||||||
|
self.items.pop(index)
|
||||||
|
for item in self.items[index:]:
|
||||||
|
self.map[item] -= 1
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
return iter(self.items)
|
||||||
|
|
||||||
|
def __reversed__(self):
|
||||||
|
return reversed(self.items)
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
if not self:
|
||||||
|
return '%s()' % (self.__class__.__name__,)
|
||||||
|
return '%s(%r)' % (self.__class__.__name__, list(self))
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
if isinstance(other, OrderedSet):
|
||||||
|
return len(self) == len(other) and self.items == other.items
|
||||||
|
try:
|
||||||
|
return type(other)(self.map) == other
|
||||||
|
except TypeError:
|
||||||
|
return False
|
||||||
791
ebook_converter/css_selectors/parser.py
Normal file
791
ebook_converter/css_selectors/parser.py
Normal file
@@ -0,0 +1,791 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
"""
|
||||||
|
Tokenizer, parser and parsed objects for CSS selectors.
|
||||||
|
|
||||||
|
:copyright: (c) 2007-2012 Ian Bicking and contributors.
|
||||||
|
See AUTHORS for more details.
|
||||||
|
:license: BSD, see LICENSE for more details.
|
||||||
|
|
||||||
|
"""
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import re
|
||||||
|
import operator
|
||||||
|
import string
|
||||||
|
|
||||||
|
from css_selectors.errors import SelectorSyntaxError, ExpressionError
|
||||||
|
from polyglot.builtins import unicode_type, codepoint_to_chr, range
|
||||||
|
|
||||||
|
|
||||||
|
utab = {c:c+32 for c in range(ord(u'A'), ord(u'Z')+1)}
|
||||||
|
|
||||||
|
if sys.version_info.major < 3:
|
||||||
|
tab = string.maketrans(string.ascii_uppercase, string.ascii_lowercase)
|
||||||
|
|
||||||
|
def ascii_lower(string):
|
||||||
|
"""Lower-case, but only in the ASCII range."""
|
||||||
|
return string.translate(utab if isinstance(string, unicode_type) else tab)
|
||||||
|
|
||||||
|
def urepr(x):
|
||||||
|
if isinstance(x, list):
|
||||||
|
return '[%s]' % ', '.join((map(urepr, x)))
|
||||||
|
ans = repr(x)
|
||||||
|
if ans.startswith("u'") or ans.startswith('u"'):
|
||||||
|
ans = ans[1:]
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
else:
|
||||||
|
|
||||||
|
def ascii_lower(x):
|
||||||
|
return x.translate(utab)
|
||||||
|
|
||||||
|
urepr = repr
|
||||||
|
|
||||||
|
|
||||||
|
# Parsed objects
|
||||||
|
|
||||||
|
class Selector(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents a parsed selector.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, tree, pseudo_element=None):
|
||||||
|
self.parsed_tree = tree
|
||||||
|
if pseudo_element is not None and not isinstance(
|
||||||
|
pseudo_element, FunctionalPseudoElement):
|
||||||
|
pseudo_element = ascii_lower(pseudo_element)
|
||||||
|
#: A :class:`FunctionalPseudoElement`,
|
||||||
|
#: or the identifier for the pseudo-element as a string,
|
||||||
|
# or ``None``.
|
||||||
|
#:
|
||||||
|
#: +-------------------------+----------------+--------------------------------+
|
||||||
|
#: | | Selector | Pseudo-element |
|
||||||
|
#: +=========================+================+================================+
|
||||||
|
#: | CSS3 syntax | ``a::before`` | ``'before'`` |
|
||||||
|
#: +-------------------------+----------------+--------------------------------+
|
||||||
|
#: | Older syntax | ``a:before`` | ``'before'`` |
|
||||||
|
#: +-------------------------+----------------+--------------------------------+
|
||||||
|
#: | From the Lists3_ draft, | ``li::marker`` | ``'marker'`` |
|
||||||
|
#: | not in Selectors3 | | |
|
||||||
|
#: +-------------------------+----------------+--------------------------------+
|
||||||
|
#: | Invalid pseudo-class | ``li:marker`` | ``None`` |
|
||||||
|
#: +-------------------------+----------------+--------------------------------+
|
||||||
|
#: | Functinal | ``a::foo(2)`` | ``FunctionalPseudoElement(…)`` |
|
||||||
|
#: +-------------------------+----------------+--------------------------------+
|
||||||
|
#:
|
||||||
|
# : .. _Lists3: http://www.w3.org/TR/2011/WD-css3-lists-20110524/#marker-pseudoelement
|
||||||
|
self.pseudo_element = pseudo_element
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
if isinstance(self.pseudo_element, FunctionalPseudoElement):
|
||||||
|
pseudo_element = repr(self.pseudo_element)
|
||||||
|
if self.pseudo_element:
|
||||||
|
pseudo_element = '::%s' % self.pseudo_element
|
||||||
|
else:
|
||||||
|
pseudo_element = ''
|
||||||
|
return '%s[%r%s]' % (
|
||||||
|
self.__class__.__name__, self.parsed_tree, pseudo_element)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
"""Return the specificity_ of this selector as a tuple of 3 integers.
|
||||||
|
|
||||||
|
.. _specificity: http://www.w3.org/TR/selectors/#specificity
|
||||||
|
|
||||||
|
"""
|
||||||
|
a, b, c = self.parsed_tree.specificity()
|
||||||
|
if self.pseudo_element:
|
||||||
|
c += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class Class(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector.class_name
|
||||||
|
"""
|
||||||
|
def __init__(self, selector, class_name):
|
||||||
|
self.selector = selector
|
||||||
|
self.class_name = class_name
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '%s[%r.%s]' % (
|
||||||
|
self.__class__.__name__, self.selector, self.class_name)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a, b, c = self.selector.specificity()
|
||||||
|
b += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class FunctionalPseudoElement(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector::name(arguments)
|
||||||
|
|
||||||
|
.. attribute:: name
|
||||||
|
|
||||||
|
The name (identifier) of the pseudo-element, as a string.
|
||||||
|
|
||||||
|
.. attribute:: arguments
|
||||||
|
|
||||||
|
The arguments of the pseudo-element, as a list of tokens.
|
||||||
|
|
||||||
|
**Note:** tokens are not part of the public API,
|
||||||
|
and may change between versions.
|
||||||
|
Use at your own risks.
|
||||||
|
|
||||||
|
"""
|
||||||
|
def __init__(self, name, arguments):
|
||||||
|
self.name = ascii_lower(name)
|
||||||
|
self.arguments = arguments
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '%s[::%s(%s)]' % (
|
||||||
|
self.__class__.__name__, self.name,
|
||||||
|
urepr([token.value for token in self.arguments]))
|
||||||
|
|
||||||
|
def argument_types(self):
|
||||||
|
return [token.type for token in self.arguments]
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a, b, c = self.selector.specificity()
|
||||||
|
b += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class Function(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector:name(expr)
|
||||||
|
"""
|
||||||
|
def __init__(self, selector, name, arguments):
|
||||||
|
self.selector = selector
|
||||||
|
self.name = ascii_lower(name)
|
||||||
|
self.arguments = arguments
|
||||||
|
self._parsed_arguments = None
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '%s[%r:%s(%s)]' % (
|
||||||
|
self.__class__.__name__, self.selector, self.name,
|
||||||
|
urepr([token.value for token in self.arguments]))
|
||||||
|
|
||||||
|
def argument_types(self):
|
||||||
|
return [token.type for token in self.arguments]
|
||||||
|
|
||||||
|
@property
|
||||||
|
def parsed_arguments(self):
|
||||||
|
if self._parsed_arguments is None:
|
||||||
|
try:
|
||||||
|
self._parsed_arguments = parse_series(self.arguments)
|
||||||
|
except ValueError:
|
||||||
|
raise ExpressionError("Invalid series: '%r'" % self.arguments)
|
||||||
|
return self._parsed_arguments
|
||||||
|
|
||||||
|
def parse_arguments(self):
|
||||||
|
if not self.arguments_parsed:
|
||||||
|
self.arguments_parsed = True
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a, b, c = self.selector.specificity()
|
||||||
|
b += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class Pseudo(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector:ident
|
||||||
|
"""
|
||||||
|
def __init__(self, selector, ident):
|
||||||
|
self.selector = selector
|
||||||
|
self.ident = ascii_lower(ident)
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '%s[%r:%s]' % (
|
||||||
|
self.__class__.__name__, self.selector, self.ident)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a, b, c = self.selector.specificity()
|
||||||
|
b += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class Negation(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector:not(subselector)
|
||||||
|
"""
|
||||||
|
def __init__(self, selector, subselector):
|
||||||
|
self.selector = selector
|
||||||
|
self.subselector = subselector
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '%s[%r:not(%r)]' % (
|
||||||
|
self.__class__.__name__, self.selector, self.subselector)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a1, b1, c1 = self.selector.specificity()
|
||||||
|
a2, b2, c2 = self.subselector.specificity()
|
||||||
|
return a1 + a2, b1 + b2, c1 + c2
|
||||||
|
|
||||||
|
|
||||||
|
class Attrib(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector[namespace|attrib operator value]
|
||||||
|
"""
|
||||||
|
def __init__(self, selector, namespace, attrib, operator, value):
|
||||||
|
self.selector = selector
|
||||||
|
self.namespace = namespace
|
||||||
|
self.attrib = attrib
|
||||||
|
self.operator = operator
|
||||||
|
self.value = value
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
if self.namespace:
|
||||||
|
attrib = '%s|%s' % (self.namespace, self.attrib)
|
||||||
|
else:
|
||||||
|
attrib = self.attrib
|
||||||
|
if self.operator == 'exists':
|
||||||
|
return '%s[%r[%s]]' % (
|
||||||
|
self.__class__.__name__, self.selector, attrib)
|
||||||
|
else:
|
||||||
|
return '%s[%r[%s %s %s]]' % (
|
||||||
|
self.__class__.__name__, self.selector, attrib,
|
||||||
|
self.operator, urepr(self.value))
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a, b, c = self.selector.specificity()
|
||||||
|
b += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class Element(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents namespace|element
|
||||||
|
|
||||||
|
`None` is for the universal selector '*'
|
||||||
|
|
||||||
|
"""
|
||||||
|
def __init__(self, namespace=None, element=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.element = element
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
element = self.element or '*'
|
||||||
|
if self.namespace:
|
||||||
|
element = '%s|%s' % (self.namespace, element)
|
||||||
|
return '%s[%s]' % (self.__class__.__name__, element)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
if self.element:
|
||||||
|
return 0, 0, 1
|
||||||
|
else:
|
||||||
|
return 0, 0, 0
|
||||||
|
|
||||||
|
|
||||||
|
class Hash(object):
|
||||||
|
|
||||||
|
"""
|
||||||
|
Represents selector#id
|
||||||
|
"""
|
||||||
|
def __init__(self, selector, id):
|
||||||
|
self.selector = selector
|
||||||
|
self.id = id
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '%s[%r#%s]' % (
|
||||||
|
self.__class__.__name__, self.selector, self.id)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a, b, c = self.selector.specificity()
|
||||||
|
a += 1
|
||||||
|
return a, b, c
|
||||||
|
|
||||||
|
|
||||||
|
class CombinedSelector(object):
|
||||||
|
|
||||||
|
def __init__(self, selector, combinator, subselector):
|
||||||
|
assert selector is not None
|
||||||
|
self.selector = selector
|
||||||
|
self.combinator = combinator
|
||||||
|
self.subselector = subselector
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
if self.combinator == ' ':
|
||||||
|
comb = '<followed>'
|
||||||
|
else:
|
||||||
|
comb = self.combinator
|
||||||
|
return '%s[%r %s %r]' % (
|
||||||
|
self.__class__.__name__, self.selector, comb, self.subselector)
|
||||||
|
|
||||||
|
def specificity(self):
|
||||||
|
a1, b1, c1 = self.selector.specificity()
|
||||||
|
a2, b2, c2 = self.subselector.specificity()
|
||||||
|
return a1 + a2, b1 + b2, c1 + c2
|
||||||
|
|
||||||
|
|
||||||
|
# Parser
|
||||||
|
|
||||||
|
# foo
|
||||||
|
_el_re = re.compile(r'^[ \t\r\n\f]*([a-zA-Z]+)[ \t\r\n\f]*$')
|
||||||
|
|
||||||
|
# foo#bar or #bar
|
||||||
|
_id_re = re.compile(r'^[ \t\r\n\f]*([a-zA-Z]*)#([a-zA-Z0-9_-]+)[ \t\r\n\f]*$')
|
||||||
|
|
||||||
|
# foo.bar or .bar
|
||||||
|
_class_re = re.compile(
|
||||||
|
r'^[ \t\r\n\f]*([a-zA-Z]*)\.([a-zA-Z][a-zA-Z0-9_-]*)[ \t\r\n\f]*$')
|
||||||
|
|
||||||
|
|
||||||
|
def parse(css):
|
||||||
|
"""Parse a CSS *group of selectors*.
|
||||||
|
|
||||||
|
:param css:
|
||||||
|
A *group of selectors* as an Unicode string.
|
||||||
|
:raises:
|
||||||
|
:class:`SelectorSyntaxError` on invalid selectors.
|
||||||
|
:returns:
|
||||||
|
A list of parsed :class:`Selector` objects, one for each
|
||||||
|
selector in the comma-separated group.
|
||||||
|
|
||||||
|
"""
|
||||||
|
# Fast path for simple cases
|
||||||
|
match = _el_re.match(css)
|
||||||
|
if match:
|
||||||
|
return [Selector(Element(element=match.group(1)))]
|
||||||
|
match = _id_re.match(css)
|
||||||
|
if match is not None:
|
||||||
|
return [Selector(Hash(Element(element=match.group(1) or None),
|
||||||
|
match.group(2)))]
|
||||||
|
match = _class_re.match(css)
|
||||||
|
if match is not None:
|
||||||
|
return [Selector(Class(Element(element=match.group(1) or None),
|
||||||
|
match.group(2)))]
|
||||||
|
|
||||||
|
stream = TokenStream(tokenize(css))
|
||||||
|
stream.source = css
|
||||||
|
return list(parse_selector_group(stream))
|
||||||
|
# except SelectorSyntaxError:
|
||||||
|
# e = sys.exc_info()[1]
|
||||||
|
# message = "%s at %s -> %r" % (
|
||||||
|
# e, stream.used, stream.peek())
|
||||||
|
# e.msg = message
|
||||||
|
# e.args = tuple([message])
|
||||||
|
# raise
|
||||||
|
|
||||||
|
|
||||||
|
def parse_selector_group(stream):
|
||||||
|
stream.skip_whitespace()
|
||||||
|
while 1:
|
||||||
|
yield Selector(*parse_selector(stream))
|
||||||
|
if stream.peek() == ('DELIM', ','):
|
||||||
|
stream.next()
|
||||||
|
stream.skip_whitespace()
|
||||||
|
else:
|
||||||
|
break
|
||||||
|
|
||||||
|
|
||||||
|
def parse_selector(stream):
|
||||||
|
result, pseudo_element = parse_simple_selector(stream)
|
||||||
|
while 1:
|
||||||
|
stream.skip_whitespace()
|
||||||
|
peek = stream.peek()
|
||||||
|
if peek in (('EOF', None), ('DELIM', ',')):
|
||||||
|
break
|
||||||
|
if pseudo_element:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
'Got pseudo-element ::%s not at the end of a selector'
|
||||||
|
% pseudo_element)
|
||||||
|
if peek.is_delim('+', '>', '~'):
|
||||||
|
# A combinator
|
||||||
|
combinator = stream.next().value
|
||||||
|
stream.skip_whitespace()
|
||||||
|
else:
|
||||||
|
# By exclusion, the last parse_simple_selector() ended
|
||||||
|
# at peek == ' '
|
||||||
|
combinator = ' '
|
||||||
|
next_selector, pseudo_element = parse_simple_selector(stream)
|
||||||
|
result = CombinedSelector(result, combinator, next_selector)
|
||||||
|
return result, pseudo_element
|
||||||
|
|
||||||
|
|
||||||
|
special_pseudo_elements = (
|
||||||
|
'first-line', 'first-letter', 'before', 'after')
|
||||||
|
|
||||||
|
|
||||||
|
def parse_simple_selector(stream, inside_negation=False):
|
||||||
|
stream.skip_whitespace()
|
||||||
|
selector_start = len(stream.used)
|
||||||
|
peek = stream.peek()
|
||||||
|
if peek.type == 'IDENT' or peek == ('DELIM', '*'):
|
||||||
|
if peek.type == 'IDENT':
|
||||||
|
namespace = stream.next().value
|
||||||
|
else:
|
||||||
|
stream.next()
|
||||||
|
namespace = None
|
||||||
|
if stream.peek() == ('DELIM', '|'):
|
||||||
|
stream.next()
|
||||||
|
element = stream.next_ident_or_star()
|
||||||
|
else:
|
||||||
|
element = namespace
|
||||||
|
namespace = None
|
||||||
|
else:
|
||||||
|
element = namespace = None
|
||||||
|
result = Element(namespace, element)
|
||||||
|
pseudo_element = None
|
||||||
|
while 1:
|
||||||
|
peek = stream.peek()
|
||||||
|
if peek.type in ('S', 'EOF') or peek.is_delim(',', '+', '>', '~') or (
|
||||||
|
inside_negation and peek == ('DELIM', ')')):
|
||||||
|
break
|
||||||
|
if pseudo_element:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
'Got pseudo-element ::%s not at the end of a selector'
|
||||||
|
% pseudo_element)
|
||||||
|
if peek.type == 'HASH':
|
||||||
|
result = Hash(result, stream.next().value)
|
||||||
|
elif peek == ('DELIM', '.'):
|
||||||
|
stream.next()
|
||||||
|
result = Class(result, stream.next_ident())
|
||||||
|
elif peek == ('DELIM', '['):
|
||||||
|
stream.next()
|
||||||
|
result = parse_attrib(result, stream)
|
||||||
|
elif peek == ('DELIM', ':'):
|
||||||
|
stream.next()
|
||||||
|
if stream.peek() == ('DELIM', ':'):
|
||||||
|
stream.next()
|
||||||
|
pseudo_element = stream.next_ident()
|
||||||
|
if stream.peek() == ('DELIM', '('):
|
||||||
|
stream.next()
|
||||||
|
pseudo_element = FunctionalPseudoElement(
|
||||||
|
pseudo_element, parse_arguments(stream))
|
||||||
|
continue
|
||||||
|
ident = stream.next_ident()
|
||||||
|
if ident.lower() in special_pseudo_elements:
|
||||||
|
# Special case: CSS 2.1 pseudo-elements can have a single ':'
|
||||||
|
# Any new pseudo-element must have two.
|
||||||
|
pseudo_element = unicode_type(ident)
|
||||||
|
continue
|
||||||
|
if stream.peek() != ('DELIM', '('):
|
||||||
|
result = Pseudo(result, ident)
|
||||||
|
continue
|
||||||
|
stream.next()
|
||||||
|
stream.skip_whitespace()
|
||||||
|
if ident.lower() == 'not':
|
||||||
|
if inside_negation:
|
||||||
|
raise SelectorSyntaxError('Got nested :not()')
|
||||||
|
argument, argument_pseudo_element = parse_simple_selector(
|
||||||
|
stream, inside_negation=True)
|
||||||
|
next = stream.next()
|
||||||
|
if argument_pseudo_element:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
'Got pseudo-element ::%s inside :not() at %s'
|
||||||
|
% (argument_pseudo_element, next.pos))
|
||||||
|
if next != ('DELIM', ')'):
|
||||||
|
raise SelectorSyntaxError("Expected ')', got %s" % (next,))
|
||||||
|
result = Negation(result, argument)
|
||||||
|
else:
|
||||||
|
result = Function(result, ident, parse_arguments(stream))
|
||||||
|
else:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected selector, got %s" % (peek,))
|
||||||
|
if len(stream.used) == selector_start:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected selector, got %s" % (stream.peek(),))
|
||||||
|
return result, pseudo_element
|
||||||
|
|
||||||
|
|
||||||
|
def parse_arguments(stream):
|
||||||
|
arguments = []
|
||||||
|
while 1:
|
||||||
|
stream.skip_whitespace()
|
||||||
|
next = stream.next()
|
||||||
|
if next.type in ('IDENT', 'STRING', 'NUMBER') or next in [
|
||||||
|
('DELIM', '+'), ('DELIM', '-')]:
|
||||||
|
arguments.append(next)
|
||||||
|
elif next == ('DELIM', ')'):
|
||||||
|
return arguments
|
||||||
|
else:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected an argument, got %s" % (next,))
|
||||||
|
|
||||||
|
|
||||||
|
def parse_attrib(selector, stream):
|
||||||
|
stream.skip_whitespace()
|
||||||
|
attrib = stream.next_ident_or_star()
|
||||||
|
if attrib is None and stream.peek() != ('DELIM', '|'):
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected '|', got %s" % (stream.peek(),))
|
||||||
|
if stream.peek() == ('DELIM', '|'):
|
||||||
|
stream.next()
|
||||||
|
if stream.peek() == ('DELIM', '='):
|
||||||
|
namespace = None
|
||||||
|
stream.next()
|
||||||
|
op = '|='
|
||||||
|
else:
|
||||||
|
namespace = attrib
|
||||||
|
attrib = stream.next_ident()
|
||||||
|
op = None
|
||||||
|
else:
|
||||||
|
namespace = op = None
|
||||||
|
if op is None:
|
||||||
|
stream.skip_whitespace()
|
||||||
|
next = stream.next()
|
||||||
|
if next == ('DELIM', ']'):
|
||||||
|
return Attrib(selector, namespace, attrib, 'exists', None)
|
||||||
|
elif next == ('DELIM', '='):
|
||||||
|
op = '='
|
||||||
|
elif next.is_delim('^', '$', '*', '~', '|', '!') and (
|
||||||
|
stream.peek() == ('DELIM', '=')):
|
||||||
|
op = next.value + '='
|
||||||
|
stream.next()
|
||||||
|
else:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Operator expected, got %s" % (next,))
|
||||||
|
stream.skip_whitespace()
|
||||||
|
value = stream.next()
|
||||||
|
if value.type not in ('IDENT', 'STRING'):
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected string or ident, got %s" % (value,))
|
||||||
|
stream.skip_whitespace()
|
||||||
|
next = stream.next()
|
||||||
|
if next != ('DELIM', ']'):
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected ']', got %s" % (next,))
|
||||||
|
return Attrib(selector, namespace, attrib, op, value.value)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_series(tokens):
|
||||||
|
"""
|
||||||
|
Parses the arguments for :nth-child() and friends.
|
||||||
|
|
||||||
|
:raises: A list of tokens
|
||||||
|
:returns: :``(a, b)``
|
||||||
|
|
||||||
|
"""
|
||||||
|
for token in tokens:
|
||||||
|
if token.type == 'STRING':
|
||||||
|
raise ValueError('String tokens not allowed in series.')
|
||||||
|
s = ''.join(token.value for token in tokens).strip()
|
||||||
|
if s == 'odd':
|
||||||
|
return (2, 1)
|
||||||
|
elif s == 'even':
|
||||||
|
return (2, 0)
|
||||||
|
elif s == 'n':
|
||||||
|
return (1, 0)
|
||||||
|
if 'n' not in s:
|
||||||
|
# Just b
|
||||||
|
return (0, int(s))
|
||||||
|
a, b = s.split('n', 1)
|
||||||
|
if not a:
|
||||||
|
a = 1
|
||||||
|
elif a == '-' or a == '+':
|
||||||
|
a = int(a+'1')
|
||||||
|
else:
|
||||||
|
a = int(a)
|
||||||
|
if not b:
|
||||||
|
b = 0
|
||||||
|
else:
|
||||||
|
b = int(b)
|
||||||
|
return (a, b)
|
||||||
|
|
||||||
|
|
||||||
|
# Token objects
|
||||||
|
|
||||||
|
class Token(tuple):
|
||||||
|
|
||||||
|
def __new__(cls, type_, value, pos):
|
||||||
|
obj = tuple.__new__(cls, (type_, value))
|
||||||
|
obj.pos = pos
|
||||||
|
return obj
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return "<%s '%s' at %i>" % (self.type, self.value, self.pos)
|
||||||
|
|
||||||
|
def is_delim(self, *values):
|
||||||
|
return self.type == 'DELIM' and self.value in values
|
||||||
|
|
||||||
|
type = property(operator.itemgetter(0))
|
||||||
|
value = property(operator.itemgetter(1))
|
||||||
|
|
||||||
|
|
||||||
|
class EOFToken(Token):
|
||||||
|
|
||||||
|
def __new__(cls, pos):
|
||||||
|
return Token.__new__(cls, 'EOF', None, pos)
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return '<%s at %i>' % (self.type, self.pos)
|
||||||
|
|
||||||
|
|
||||||
|
# Tokenizer
|
||||||
|
|
||||||
|
|
||||||
|
class TokenMacros:
|
||||||
|
unicode_escape = r'\\([0-9a-f]{1,6})(?:\r\n|[ \n\r\t\f])?'
|
||||||
|
escape = unicode_escape + r'|\\[^\n\r\f0-9a-f]'
|
||||||
|
string_escape = r'\\(?:\n|\r\n|\r|\f)|' + escape
|
||||||
|
nonascii = r'[^\0-\177]'
|
||||||
|
nmchar = '[_a-z0-9-]|%s|%s' % (escape, nonascii)
|
||||||
|
nmstart = '[_a-z]|%s|%s' % (escape, nonascii)
|
||||||
|
|
||||||
|
|
||||||
|
def _compile(pattern):
|
||||||
|
return re.compile(pattern % vars(TokenMacros), re.IGNORECASE).match
|
||||||
|
|
||||||
|
|
||||||
|
_match_whitespace = _compile(r'[ \t\r\n\f]+')
|
||||||
|
_match_number = _compile(r'[+-]?(?:[0-9]*\.[0-9]+|[0-9]+)')
|
||||||
|
_match_hash = _compile('#(?:%(nmchar)s)+')
|
||||||
|
_match_ident = _compile('-?(?:%(nmstart)s)(?:%(nmchar)s)*')
|
||||||
|
_match_string_by_quote = {
|
||||||
|
"'": _compile(r"([^\n\r\f\\']|%(string_escape)s)*"),
|
||||||
|
'"': _compile(r'([^\n\r\f\\"]|%(string_escape)s)*'),
|
||||||
|
}
|
||||||
|
|
||||||
|
_sub_simple_escape = re.compile(r'\\(.)').sub
|
||||||
|
_sub_unicode_escape = re.compile(TokenMacros.unicode_escape, re.I).sub
|
||||||
|
_sub_newline_escape =re.compile(r'\\(?:\n|\r\n|\r|\f)').sub
|
||||||
|
|
||||||
|
# Same as r'\1', but faster on CPython
|
||||||
|
if hasattr(operator, 'methodcaller'):
|
||||||
|
# Python 2.6+
|
||||||
|
_replace_simple = operator.methodcaller('group', 1)
|
||||||
|
else:
|
||||||
|
def _replace_simple(match):
|
||||||
|
return match.group(1)
|
||||||
|
|
||||||
|
|
||||||
|
def _replace_unicode(match):
|
||||||
|
codepoint = int(match.group(1), 16)
|
||||||
|
if codepoint > sys.maxunicode:
|
||||||
|
codepoint = 0xFFFD
|
||||||
|
return codepoint_to_chr(codepoint)
|
||||||
|
|
||||||
|
|
||||||
|
def unescape_ident(value):
|
||||||
|
value = _sub_unicode_escape(_replace_unicode, value)
|
||||||
|
value = _sub_simple_escape(_replace_simple, value)
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
def tokenize(s):
|
||||||
|
pos = 0
|
||||||
|
len_s = len(s)
|
||||||
|
while pos < len_s:
|
||||||
|
match = _match_whitespace(s, pos=pos)
|
||||||
|
if match:
|
||||||
|
yield Token('S', ' ', pos)
|
||||||
|
pos = match.end()
|
||||||
|
continue
|
||||||
|
|
||||||
|
match = _match_ident(s, pos=pos)
|
||||||
|
if match:
|
||||||
|
value = _sub_simple_escape(_replace_simple,
|
||||||
|
_sub_unicode_escape(_replace_unicode, match.group()))
|
||||||
|
yield Token('IDENT', value, pos)
|
||||||
|
pos = match.end()
|
||||||
|
continue
|
||||||
|
|
||||||
|
match = _match_hash(s, pos=pos)
|
||||||
|
if match:
|
||||||
|
value = _sub_simple_escape(_replace_simple,
|
||||||
|
_sub_unicode_escape(_replace_unicode, match.group()[1:]))
|
||||||
|
yield Token('HASH', value, pos)
|
||||||
|
pos = match.end()
|
||||||
|
continue
|
||||||
|
|
||||||
|
quote = s[pos]
|
||||||
|
if quote in _match_string_by_quote:
|
||||||
|
match = _match_string_by_quote[quote](s, pos=pos + 1)
|
||||||
|
assert match, 'Should have found at least an empty match'
|
||||||
|
end_pos = match.end()
|
||||||
|
if end_pos == len_s:
|
||||||
|
raise SelectorSyntaxError('Unclosed string at %s' % pos)
|
||||||
|
if s[end_pos] != quote:
|
||||||
|
raise SelectorSyntaxError('Invalid string at %s' % pos)
|
||||||
|
value = _sub_simple_escape(_replace_simple,
|
||||||
|
_sub_unicode_escape(_replace_unicode,
|
||||||
|
_sub_newline_escape('', match.group())))
|
||||||
|
yield Token('STRING', value, pos)
|
||||||
|
pos = end_pos + 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
match = _match_number(s, pos=pos)
|
||||||
|
if match:
|
||||||
|
value = match.group()
|
||||||
|
yield Token('NUMBER', value, pos)
|
||||||
|
pos = match.end()
|
||||||
|
continue
|
||||||
|
|
||||||
|
pos2 = pos + 2
|
||||||
|
if s[pos:pos2] == '/*':
|
||||||
|
pos = s.find('*/', pos2)
|
||||||
|
if pos == -1:
|
||||||
|
pos = len_s
|
||||||
|
else:
|
||||||
|
pos += 2
|
||||||
|
continue
|
||||||
|
|
||||||
|
yield Token('DELIM', s[pos], pos)
|
||||||
|
pos += 1
|
||||||
|
|
||||||
|
assert pos == len_s
|
||||||
|
yield EOFToken(pos)
|
||||||
|
|
||||||
|
|
||||||
|
class TokenStream(object):
|
||||||
|
|
||||||
|
def __init__(self, tokens, source=None):
|
||||||
|
self.used = []
|
||||||
|
self.tokens = iter(tokens)
|
||||||
|
self.source = source
|
||||||
|
self.peeked = None
|
||||||
|
self._peeking = False
|
||||||
|
try:
|
||||||
|
self.next_token = self.tokens.next
|
||||||
|
except AttributeError:
|
||||||
|
# Python 3
|
||||||
|
self.next_token = self.tokens.__next__
|
||||||
|
|
||||||
|
def next(self):
|
||||||
|
if self._peeking:
|
||||||
|
self._peeking = False
|
||||||
|
self.used.append(self.peeked)
|
||||||
|
return self.peeked
|
||||||
|
else:
|
||||||
|
next = self.next_token()
|
||||||
|
self.used.append(next)
|
||||||
|
return next
|
||||||
|
|
||||||
|
def peek(self):
|
||||||
|
if not self._peeking:
|
||||||
|
self.peeked = self.next_token()
|
||||||
|
self._peeking = True
|
||||||
|
return self.peeked
|
||||||
|
|
||||||
|
def next_ident(self):
|
||||||
|
next = self.next()
|
||||||
|
if next.type != 'IDENT':
|
||||||
|
raise SelectorSyntaxError('Expected ident, got %s' % (next,))
|
||||||
|
return next.value
|
||||||
|
|
||||||
|
def next_ident_or_star(self):
|
||||||
|
next = self.next()
|
||||||
|
if next.type == 'IDENT':
|
||||||
|
return next.value
|
||||||
|
elif next == ('DELIM', '*'):
|
||||||
|
return None
|
||||||
|
else:
|
||||||
|
raise SelectorSyntaxError(
|
||||||
|
"Expected ident or '*', got %s" % (next,))
|
||||||
|
|
||||||
|
def skip_whitespace(self):
|
||||||
|
peek = self.peek()
|
||||||
|
if peek.type == 'S':
|
||||||
|
self.next()
|
||||||
694
ebook_converter/css_selectors/select.py
Normal file
694
ebook_converter/css_selectors/select.py
Normal file
@@ -0,0 +1,694 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2015, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import re, itertools
|
||||||
|
from collections import OrderedDict, defaultdict
|
||||||
|
from functools import wraps
|
||||||
|
from itertools import chain
|
||||||
|
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
from css_selectors.errors import ExpressionError
|
||||||
|
from css_selectors.parser import parse, ascii_lower, Element, FunctionalPseudoElement
|
||||||
|
from css_selectors.ordered_set import OrderedSet
|
||||||
|
|
||||||
|
from polyglot.builtins import iteritems, itervalues
|
||||||
|
|
||||||
|
PARSE_CACHE_SIZE = 200
|
||||||
|
parse_cache = OrderedDict()
|
||||||
|
XPATH_CACHE_SIZE = 30
|
||||||
|
xpath_cache = OrderedDict()
|
||||||
|
|
||||||
|
# Test that the string is not empty and does not contain whitespace
|
||||||
|
is_non_whitespace = re.compile(r'^[^ \t\r\n\f]+$').match
|
||||||
|
|
||||||
|
|
||||||
|
def get_parsed_selector(raw):
|
||||||
|
try:
|
||||||
|
return parse_cache[raw]
|
||||||
|
except KeyError:
|
||||||
|
parse_cache[raw] = ans = parse(raw)
|
||||||
|
if len(parse_cache) > PARSE_CACHE_SIZE:
|
||||||
|
parse_cache.pop(next(iter(parse_cache)))
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def get_compiled_xpath(expr):
|
||||||
|
try:
|
||||||
|
return xpath_cache[expr]
|
||||||
|
except KeyError:
|
||||||
|
xpath_cache[expr] = ans = etree.XPath(expr)
|
||||||
|
if len(xpath_cache) > XPATH_CACHE_SIZE:
|
||||||
|
xpath_cache.pop(next(iter(xpath_cache)))
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
class AlwaysIn(object):
|
||||||
|
|
||||||
|
def __contains__(self, x):
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
always_in = AlwaysIn()
|
||||||
|
|
||||||
|
|
||||||
|
def trace_wrapper(func):
|
||||||
|
@wraps(func)
|
||||||
|
def trace(*args, **kwargs):
|
||||||
|
targs = args[1:] if args and isinstance(args[0], Select) else args
|
||||||
|
print('Called:', func.__name__, 'with args:', targs, kwargs or '')
|
||||||
|
return func(*args, **kwargs)
|
||||||
|
return trace
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_language_tag(tag):
|
||||||
|
"""Return a list of normalized combinations for a `BCP 47` language tag.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
>>> normalize_language_tag('de_AT-1901')
|
||||||
|
['de-at-1901', 'de-at', 'de-1901', 'de']
|
||||||
|
"""
|
||||||
|
# normalize:
|
||||||
|
tag = ascii_lower(tag).replace('_','-')
|
||||||
|
# split (except singletons, which mark the following tag as non-standard):
|
||||||
|
tag = re.sub(r'-([a-zA-Z0-9])-', r'-\1_', tag)
|
||||||
|
subtags = [subtag.replace('_', '-') for subtag in tag.split('-')]
|
||||||
|
base_tag = (subtags.pop(0),)
|
||||||
|
taglist = {base_tag[0]}
|
||||||
|
# find all combinations of subtags
|
||||||
|
for n in range(len(subtags), 0, -1):
|
||||||
|
for tags in itertools.combinations(subtags, n):
|
||||||
|
taglist.add('-'.join(base_tag + tags))
|
||||||
|
return taglist
|
||||||
|
|
||||||
|
|
||||||
|
INAPPROPRIATE_PSEUDO_CLASSES = frozenset((
|
||||||
|
'active', 'after', 'disabled', 'visited', 'link', 'before', 'focus', 'first-letter', 'enabled', 'first-line', 'hover', 'checked', 'target'))
|
||||||
|
|
||||||
|
|
||||||
|
class Select(object):
|
||||||
|
|
||||||
|
'''
|
||||||
|
|
||||||
|
This class implements CSS Level 3 selectors
|
||||||
|
(http://www.w3.org/TR/css3-selectors) on an lxml tree, with caching for
|
||||||
|
performance. To use:
|
||||||
|
|
||||||
|
>>> from css_selectors import Select
|
||||||
|
>>> select = Select(root) # Where root is an lxml document
|
||||||
|
>>> print(tuple(select('p.myclass')))
|
||||||
|
|
||||||
|
Tags are returned in document order. Note that attribute and tag names are
|
||||||
|
matched case-insensitively. Class and id values are also matched
|
||||||
|
case-insensitively. Also namespaces are ignored (this is for performance of
|
||||||
|
the common case). The UI related selectors are not implemented, such as
|
||||||
|
:enabled, :disabled, :checked, :hover, etc. Similarly, the non-element
|
||||||
|
related selectors such as ::first-line, ::first-letter, ::before, etc. are
|
||||||
|
not implemented.
|
||||||
|
|
||||||
|
WARNING: This class uses internal caches. You *must not* make any changes
|
||||||
|
to the lxml tree. If you do make some changes, either create a new Select
|
||||||
|
object or call :meth:`invalidate_caches`.
|
||||||
|
|
||||||
|
This class can be easily sub-classed to work with tree implementations
|
||||||
|
other than lxml. Simply override the methods in the ``Tree Integration``
|
||||||
|
block below.
|
||||||
|
|
||||||
|
The caching works by maintaining internal maps from classes/ids/tag
|
||||||
|
names/etc. to node sets. These caches are populated as needed, and used
|
||||||
|
for all subsequent selections. Thus, for best performance you should use
|
||||||
|
the same selector object for finding the matching nodes for multiple
|
||||||
|
queries. Of course, remember not to change the tree in between queries.
|
||||||
|
|
||||||
|
'''
|
||||||
|
|
||||||
|
combinator_mapping = {
|
||||||
|
' ': 'descendant',
|
||||||
|
'>': 'child',
|
||||||
|
'+': 'direct_adjacent',
|
||||||
|
'~': 'indirect_adjacent',
|
||||||
|
}
|
||||||
|
|
||||||
|
attribute_operator_mapping = {
|
||||||
|
'exists': 'exists',
|
||||||
|
'=': 'equals',
|
||||||
|
'~=': 'includes',
|
||||||
|
'|=': 'dashmatch',
|
||||||
|
'^=': 'prefixmatch',
|
||||||
|
'$=': 'suffixmatch',
|
||||||
|
'*=': 'substringmatch',
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, root, default_lang=None, ignore_inappropriate_pseudo_classes=False, dispatch_map=None, trace=False):
|
||||||
|
if hasattr(root, 'getroot'):
|
||||||
|
root = root.getroot()
|
||||||
|
self.root = root
|
||||||
|
self.dispatch_map = dispatch_map or default_dispatch_map
|
||||||
|
self.invalidate_caches()
|
||||||
|
self.default_lang = default_lang
|
||||||
|
if trace:
|
||||||
|
self.dispatch_map = {k:trace_wrapper(v) for k, v in iteritems(self.dispatch_map)}
|
||||||
|
if ignore_inappropriate_pseudo_classes:
|
||||||
|
self.ignore_inappropriate_pseudo_classes = INAPPROPRIATE_PSEUDO_CLASSES
|
||||||
|
else:
|
||||||
|
self.ignore_inappropriate_pseudo_classes = frozenset()
|
||||||
|
|
||||||
|
# External API {{{
|
||||||
|
def invalidate_caches(self):
|
||||||
|
'Invalidate all caches. You must call this before using this object if you have made changes to the HTML tree'
|
||||||
|
self._element_map = None
|
||||||
|
self._id_map = None
|
||||||
|
self._class_map = None
|
||||||
|
self._attrib_map = None
|
||||||
|
self._attrib_space_map = None
|
||||||
|
self._lang_map = None
|
||||||
|
self.map_tag_name = ascii_lower
|
||||||
|
if '{' in self.root.tag:
|
||||||
|
def map_tag_name(x):
|
||||||
|
return ascii_lower(x.rpartition('}')[2])
|
||||||
|
self.map_tag_name = map_tag_name
|
||||||
|
|
||||||
|
def __call__(self, selector, root=None):
|
||||||
|
''' Return an iterator over all matching tags, in document order.
|
||||||
|
Normally, all matching tags in the document are returned, is you
|
||||||
|
specify root, then only tags that are root or descendants of root are
|
||||||
|
returned. Note that this can be very expensive if root has a lot of
|
||||||
|
descendants. '''
|
||||||
|
seen = set()
|
||||||
|
if root is not None:
|
||||||
|
root = frozenset(self.itertag(root))
|
||||||
|
for parsed_selector in get_parsed_selector(selector):
|
||||||
|
for item in self.iterparsedselector(parsed_selector):
|
||||||
|
if item not in seen and (root is None or item in root):
|
||||||
|
yield item
|
||||||
|
seen.add(item)
|
||||||
|
|
||||||
|
def has_matches(self, selector, root=None):
|
||||||
|
'Return True iff selector matches at least one item in the tree'
|
||||||
|
for elem in self(selector, root=root):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def iterparsedselector(self, parsed_selector):
|
||||||
|
type_name = type(parsed_selector).__name__
|
||||||
|
try:
|
||||||
|
func = self.dispatch_map[ascii_lower(type_name)]
|
||||||
|
except KeyError:
|
||||||
|
raise ExpressionError('%s is not supported' % type_name)
|
||||||
|
for item in func(self, parsed_selector):
|
||||||
|
yield item
|
||||||
|
|
||||||
|
@property
|
||||||
|
def element_map(self):
|
||||||
|
if self._element_map is None:
|
||||||
|
self._element_map = em = defaultdict(OrderedSet)
|
||||||
|
for tag in self.itertag():
|
||||||
|
em[self.map_tag_name(tag.tag)].add(tag)
|
||||||
|
return self._element_map
|
||||||
|
|
||||||
|
@property
|
||||||
|
def id_map(self):
|
||||||
|
if self._id_map is None:
|
||||||
|
self._id_map = im = defaultdict(OrderedSet)
|
||||||
|
lower = ascii_lower
|
||||||
|
for elem in self.iteridtags():
|
||||||
|
im[lower(elem.get('id'))].add(elem)
|
||||||
|
return self._id_map
|
||||||
|
|
||||||
|
@property
|
||||||
|
def class_map(self):
|
||||||
|
if self._class_map is None:
|
||||||
|
self._class_map = cm = defaultdict(OrderedSet)
|
||||||
|
lower = ascii_lower
|
||||||
|
for elem in self.iterclasstags():
|
||||||
|
for cls in elem.get('class').split():
|
||||||
|
cm[lower(cls)].add(elem)
|
||||||
|
return self._class_map
|
||||||
|
|
||||||
|
@property
|
||||||
|
def attrib_map(self):
|
||||||
|
if self._attrib_map is None:
|
||||||
|
self._attrib_map = am = defaultdict(lambda : defaultdict(OrderedSet))
|
||||||
|
map_attrib_name = ascii_lower
|
||||||
|
if '{' in self.root.tag:
|
||||||
|
def map_attrib_name(x):
|
||||||
|
return ascii_lower(x.rpartition('}')[2])
|
||||||
|
for tag in self.itertag():
|
||||||
|
for attr, val in iteritems(tag.attrib):
|
||||||
|
am[map_attrib_name(attr)][val].add(tag)
|
||||||
|
return self._attrib_map
|
||||||
|
|
||||||
|
@property
|
||||||
|
def attrib_space_map(self):
|
||||||
|
if self._attrib_space_map is None:
|
||||||
|
self._attrib_space_map = am = defaultdict(lambda : defaultdict(OrderedSet))
|
||||||
|
map_attrib_name = ascii_lower
|
||||||
|
if '{' in self.root.tag:
|
||||||
|
def map_attrib_name(x):
|
||||||
|
return ascii_lower(x.rpartition('}')[2])
|
||||||
|
for tag in self.itertag():
|
||||||
|
for attr, val in iteritems(tag.attrib):
|
||||||
|
for v in val.split():
|
||||||
|
am[map_attrib_name(attr)][v].add(tag)
|
||||||
|
return self._attrib_space_map
|
||||||
|
|
||||||
|
@property
|
||||||
|
def lang_map(self):
|
||||||
|
if self._lang_map is None:
|
||||||
|
self._lang_map = lm = defaultdict(OrderedSet)
|
||||||
|
dl = normalize_language_tag(self.default_lang) if self.default_lang else None
|
||||||
|
lmap = {tag:dl for tag in self.itertag()} if dl else {}
|
||||||
|
for tag in self.itertag():
|
||||||
|
lang = None
|
||||||
|
for attr in ('{http://www.w3.org/XML/1998/namespace}lang', 'lang'):
|
||||||
|
lang = tag.get(attr)
|
||||||
|
if lang:
|
||||||
|
lang = normalize_language_tag(lang)
|
||||||
|
for dtag in self.itertag(tag):
|
||||||
|
lmap[dtag] = lang
|
||||||
|
for tag, langs in iteritems(lmap):
|
||||||
|
for lang in langs:
|
||||||
|
lm[lang].add(tag)
|
||||||
|
return self._lang_map
|
||||||
|
|
||||||
|
# Tree Integration {{{
|
||||||
|
def itertag(self, tag=None):
|
||||||
|
return (self.root if tag is None else tag).iter('*')
|
||||||
|
|
||||||
|
def iterdescendants(self, tag=None):
|
||||||
|
return (self.root if tag is None else tag).iterdescendants('*')
|
||||||
|
|
||||||
|
def iterchildren(self, tag=None):
|
||||||
|
return (self.root if tag is None else tag).iterchildren('*')
|
||||||
|
|
||||||
|
def itersiblings(self, tag=None, preceding=False):
|
||||||
|
return (self.root if tag is None else tag).itersiblings('*', preceding=preceding)
|
||||||
|
|
||||||
|
def iteridtags(self):
|
||||||
|
return get_compiled_xpath('//*[@id]')(self.root)
|
||||||
|
|
||||||
|
def iterclasstags(self):
|
||||||
|
return get_compiled_xpath('//*[@class]')(self.root)
|
||||||
|
|
||||||
|
def sibling_count(self, child, before=True, same_type=False):
|
||||||
|
' Return the number of siblings before or after child or raise ValueError if child has no parent. '
|
||||||
|
parent = child.getparent()
|
||||||
|
if parent is None:
|
||||||
|
raise ValueError('Child has no parent')
|
||||||
|
if same_type:
|
||||||
|
siblings = OrderedSet(child.itersiblings(preceding=before))
|
||||||
|
return len(self.element_map[self.map_tag_name(child.tag)] & siblings)
|
||||||
|
else:
|
||||||
|
if before:
|
||||||
|
return parent.index(child)
|
||||||
|
return len(parent) - parent.index(child) - 1
|
||||||
|
|
||||||
|
def all_sibling_count(self, child, same_type=False):
|
||||||
|
' Return the number of siblings of child or raise ValueError if child has no parent '
|
||||||
|
parent = child.getparent()
|
||||||
|
if parent is None:
|
||||||
|
raise ValueError('Child has no parent')
|
||||||
|
if same_type:
|
||||||
|
siblings = OrderedSet(chain(child.itersiblings(preceding=False), child.itersiblings(preceding=True)))
|
||||||
|
return len(self.element_map[self.map_tag_name(child.tag)] & siblings)
|
||||||
|
else:
|
||||||
|
return len(parent) - 1
|
||||||
|
|
||||||
|
def is_empty(self, elem):
|
||||||
|
' Return True iff elem has no child tags and no text content '
|
||||||
|
for child in elem:
|
||||||
|
# Check for comment/PI nodes with tail text
|
||||||
|
if child.tail:
|
||||||
|
return False
|
||||||
|
return len(tuple(elem.iterchildren('*'))) == 0 and not elem.text
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Combinators {{{
|
||||||
|
|
||||||
|
|
||||||
|
def select_combinedselector(cache, combined):
|
||||||
|
"""Translate a combined selector."""
|
||||||
|
combinator = cache.combinator_mapping[combined.combinator]
|
||||||
|
# Fast path for when the sub-selector is all elements
|
||||||
|
right = None if isinstance(combined.subselector, Element) and (
|
||||||
|
combined.subselector.element or '*') == '*' else cache.iterparsedselector(combined.subselector)
|
||||||
|
for item in cache.dispatch_map[combinator](cache, cache.iterparsedselector(combined.selector), right):
|
||||||
|
yield item
|
||||||
|
|
||||||
|
|
||||||
|
def select_descendant(cache, left, right):
|
||||||
|
"""right is a child, grand-child or further descendant of left"""
|
||||||
|
right = always_in if right is None else frozenset(right)
|
||||||
|
for ancestor in left:
|
||||||
|
for descendant in cache.iterdescendants(ancestor):
|
||||||
|
if descendant in right:
|
||||||
|
yield descendant
|
||||||
|
|
||||||
|
|
||||||
|
def select_child(cache, left, right):
|
||||||
|
"""right is an immediate child of left"""
|
||||||
|
right = always_in if right is None else frozenset(right)
|
||||||
|
for parent in left:
|
||||||
|
for child in cache.iterchildren(parent):
|
||||||
|
if child in right:
|
||||||
|
yield child
|
||||||
|
|
||||||
|
|
||||||
|
def select_direct_adjacent(cache, left, right):
|
||||||
|
"""right is a sibling immediately after left"""
|
||||||
|
right = always_in if right is None else frozenset(right)
|
||||||
|
for parent in left:
|
||||||
|
for sibling in cache.itersiblings(parent):
|
||||||
|
if sibling in right:
|
||||||
|
yield sibling
|
||||||
|
break
|
||||||
|
|
||||||
|
|
||||||
|
def select_indirect_adjacent(cache, left, right):
|
||||||
|
"""right is a sibling after left, immediately or not"""
|
||||||
|
right = always_in if right is None else frozenset(right)
|
||||||
|
for parent in left:
|
||||||
|
for sibling in cache.itersiblings(parent):
|
||||||
|
if sibling in right:
|
||||||
|
yield sibling
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
def select_element(cache, selector):
|
||||||
|
"""A type or universal selector."""
|
||||||
|
element = selector.element
|
||||||
|
if not element or element == '*':
|
||||||
|
for elem in cache.itertag():
|
||||||
|
yield elem
|
||||||
|
else:
|
||||||
|
for elem in cache.element_map[ascii_lower(element)]:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_hash(cache, selector):
|
||||||
|
'An id selector'
|
||||||
|
items = cache.id_map[ascii_lower(selector.id)]
|
||||||
|
if len(items) > 0:
|
||||||
|
for elem in cache.iterparsedselector(selector.selector):
|
||||||
|
if elem in items:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_class(cache, selector):
|
||||||
|
'A class selector'
|
||||||
|
items = cache.class_map[ascii_lower(selector.class_name)]
|
||||||
|
if items:
|
||||||
|
for elem in cache.iterparsedselector(selector.selector):
|
||||||
|
if elem in items:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_negation(cache, selector):
|
||||||
|
'Implement :not()'
|
||||||
|
exclude = frozenset(cache.iterparsedselector(selector.subselector))
|
||||||
|
for item in cache.iterparsedselector(selector.selector):
|
||||||
|
if item not in exclude:
|
||||||
|
yield item
|
||||||
|
|
||||||
|
# Attribute selectors {{{
|
||||||
|
|
||||||
|
|
||||||
|
def select_attrib(cache, selector):
|
||||||
|
operator = cache.attribute_operator_mapping[selector.operator]
|
||||||
|
items = frozenset(cache.dispatch_map[operator](cache, ascii_lower(selector.attrib), selector.value))
|
||||||
|
for item in cache.iterparsedselector(selector.selector):
|
||||||
|
if item in items:
|
||||||
|
yield item
|
||||||
|
|
||||||
|
|
||||||
|
def select_exists(cache, attrib, value=None):
|
||||||
|
for elem_set in itervalues(cache.attrib_map[attrib]):
|
||||||
|
for elem in elem_set:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_equals(cache, attrib, value):
|
||||||
|
for elem in cache.attrib_map[attrib][value]:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_includes(cache, attrib, value):
|
||||||
|
if is_non_whitespace(value):
|
||||||
|
for elem in cache.attrib_space_map[attrib][value]:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_dashmatch(cache, attrib, value):
|
||||||
|
if value:
|
||||||
|
for val, elem_set in iteritems(cache.attrib_map[attrib]):
|
||||||
|
if val == value or val.startswith(value + '-'):
|
||||||
|
for elem in elem_set:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_prefixmatch(cache, attrib, value):
|
||||||
|
if value:
|
||||||
|
for val, elem_set in iteritems(cache.attrib_map[attrib]):
|
||||||
|
if val.startswith(value):
|
||||||
|
for elem in elem_set:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_suffixmatch(cache, attrib, value):
|
||||||
|
if value:
|
||||||
|
for val, elem_set in iteritems(cache.attrib_map[attrib]):
|
||||||
|
if val.endswith(value):
|
||||||
|
for elem in elem_set:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_substringmatch(cache, attrib, value):
|
||||||
|
if value:
|
||||||
|
for val, elem_set in iteritems(cache.attrib_map[attrib]):
|
||||||
|
if value in val:
|
||||||
|
for elem in elem_set:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Function selectors {{{
|
||||||
|
|
||||||
|
|
||||||
|
def select_function(cache, function):
|
||||||
|
"""Select with a functional pseudo-class."""
|
||||||
|
fname = function.name.replace('-', '_')
|
||||||
|
try:
|
||||||
|
func = cache.dispatch_map[fname]
|
||||||
|
except KeyError:
|
||||||
|
raise ExpressionError(
|
||||||
|
"The pseudo-class :%s() is unknown" % function.name)
|
||||||
|
if fname == 'lang':
|
||||||
|
items = frozenset(func(cache, function))
|
||||||
|
for item in cache.iterparsedselector(function.selector):
|
||||||
|
if item in items:
|
||||||
|
yield item
|
||||||
|
else:
|
||||||
|
for item in cache.iterparsedselector(function.selector):
|
||||||
|
if func(cache, function, item):
|
||||||
|
yield item
|
||||||
|
|
||||||
|
|
||||||
|
def select_lang(cache, function):
|
||||||
|
' Implement :lang() '
|
||||||
|
if function.argument_types() not in (['STRING'], ['IDENT']):
|
||||||
|
raise ExpressionError("Expected a single string or ident for :lang(), got %r" % function.arguments)
|
||||||
|
lang = function.arguments[0].value
|
||||||
|
if lang:
|
||||||
|
lang = ascii_lower(lang)
|
||||||
|
lp = lang + '-'
|
||||||
|
for tlang, elem_set in iteritems(cache.lang_map):
|
||||||
|
if tlang == lang or (tlang is not None and tlang.startswith(lp)):
|
||||||
|
for elem in elem_set:
|
||||||
|
yield elem
|
||||||
|
|
||||||
|
|
||||||
|
def select_nth_child(cache, function, elem):
|
||||||
|
' Implement :nth-child() '
|
||||||
|
a, b = function.parsed_arguments
|
||||||
|
try:
|
||||||
|
num = cache.sibling_count(elem) + 1
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
if a == 0:
|
||||||
|
return num == b
|
||||||
|
n = (num - b) / a
|
||||||
|
return n.is_integer() and n > -1
|
||||||
|
|
||||||
|
|
||||||
|
def select_nth_last_child(cache, function, elem):
|
||||||
|
' Implement :nth-last-child() '
|
||||||
|
a, b = function.parsed_arguments
|
||||||
|
try:
|
||||||
|
num = cache.sibling_count(elem, before=False) + 1
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
if a == 0:
|
||||||
|
return num == b
|
||||||
|
n = (num - b) / a
|
||||||
|
return n.is_integer() and n > -1
|
||||||
|
|
||||||
|
|
||||||
|
def select_nth_of_type(cache, function, elem):
|
||||||
|
' Implement :nth-of-type() '
|
||||||
|
a, b = function.parsed_arguments
|
||||||
|
try:
|
||||||
|
num = cache.sibling_count(elem, same_type=True) + 1
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
if a == 0:
|
||||||
|
return num == b
|
||||||
|
n = (num - b) / a
|
||||||
|
return n.is_integer() and n > -1
|
||||||
|
|
||||||
|
|
||||||
|
def select_nth_last_of_type(cache, function, elem):
|
||||||
|
' Implement :nth-last-of-type() '
|
||||||
|
a, b = function.parsed_arguments
|
||||||
|
try:
|
||||||
|
num = cache.sibling_count(elem, before=False, same_type=True) + 1
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
if a == 0:
|
||||||
|
return num == b
|
||||||
|
n = (num - b) / a
|
||||||
|
return n.is_integer() and n > -1
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Pseudo elements {{{
|
||||||
|
|
||||||
|
|
||||||
|
def pseudo_func(f):
|
||||||
|
f.is_pseudo = True
|
||||||
|
return f
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def allow_all(cache, item):
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def get_func_for_pseudo(cache, ident):
|
||||||
|
try:
|
||||||
|
func = cache.dispatch_map[ident.replace('-', '_')]
|
||||||
|
except KeyError:
|
||||||
|
if ident in cache.ignore_inappropriate_pseudo_classes:
|
||||||
|
func = allow_all
|
||||||
|
else:
|
||||||
|
raise ExpressionError(
|
||||||
|
"The pseudo-class :%s is not supported" % ident)
|
||||||
|
|
||||||
|
try:
|
||||||
|
func.is_pseudo
|
||||||
|
except AttributeError:
|
||||||
|
raise ExpressionError(
|
||||||
|
"The pseudo-class :%s is invalid" % ident)
|
||||||
|
return func
|
||||||
|
|
||||||
|
|
||||||
|
def select_selector(cache, selector):
|
||||||
|
if selector.pseudo_element is None:
|
||||||
|
for item in cache.iterparsedselector(selector.parsed_tree):
|
||||||
|
yield item
|
||||||
|
return
|
||||||
|
if isinstance(selector.pseudo_element, FunctionalPseudoElement):
|
||||||
|
raise ExpressionError(
|
||||||
|
"The pseudo-element ::%s is not supported" % selector.pseudo_element.name)
|
||||||
|
func = get_func_for_pseudo(cache, selector.pseudo_element)
|
||||||
|
for item in cache.iterparsedselector(selector.parsed_tree):
|
||||||
|
if func(cache, item):
|
||||||
|
yield item
|
||||||
|
|
||||||
|
|
||||||
|
def select_pseudo(cache, pseudo):
|
||||||
|
func = get_func_for_pseudo(cache, pseudo.ident)
|
||||||
|
if func is select_root:
|
||||||
|
yield cache.root
|
||||||
|
return
|
||||||
|
|
||||||
|
for item in cache.iterparsedselector(pseudo.selector):
|
||||||
|
if func(cache, item):
|
||||||
|
yield item
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_root(cache, elem):
|
||||||
|
return elem is cache.root
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_first_child(cache, elem):
|
||||||
|
try:
|
||||||
|
return cache.sibling_count(elem) == 0
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_last_child(cache, elem):
|
||||||
|
try:
|
||||||
|
return cache.sibling_count(elem, before=False) == 0
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_only_child(cache, elem):
|
||||||
|
try:
|
||||||
|
return cache.all_sibling_count(elem) == 0
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_first_of_type(cache, elem):
|
||||||
|
try:
|
||||||
|
return cache.sibling_count(elem, same_type=True) == 0
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_last_of_type(cache, elem):
|
||||||
|
try:
|
||||||
|
return cache.sibling_count(elem, before=False, same_type=True) == 0
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_only_of_type(cache, elem):
|
||||||
|
try:
|
||||||
|
return cache.all_sibling_count(elem, same_type=True) == 0
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
@pseudo_func
|
||||||
|
def select_empty(cache, elem):
|
||||||
|
return cache.is_empty(elem)
|
||||||
|
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
default_dispatch_map = {name.partition('_')[2]:obj for name, obj in globals().items() if name.startswith('select_') and callable(obj)}
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from pprint import pprint
|
||||||
|
root = etree.fromstring(
|
||||||
|
'<body xmlns="xxx" xml:lang="en"><p id="p" class="one two" lang="fr"><a id="a"/><b/><c/><d/></p></body>',
|
||||||
|
parser=etree.XMLParser(recover=True, no_network=True, resolve_entities=False))
|
||||||
|
select = Select(root, ignore_inappropriate_pseudo_classes=True, trace=True)
|
||||||
|
pprint(list(select('p:disabled')))
|
||||||
843
ebook_converter/css_selectors/tests.py
Normal file
843
ebook_converter/css_selectors/tests.py
Normal file
@@ -0,0 +1,843 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2015, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import unittest, sys, argparse
|
||||||
|
|
||||||
|
from lxml import etree, html
|
||||||
|
|
||||||
|
from css_selectors.errors import SelectorSyntaxError, ExpressionError
|
||||||
|
from css_selectors.parser import tokenize, parse
|
||||||
|
from css_selectors.select import Select
|
||||||
|
|
||||||
|
|
||||||
|
class TestCSSSelectors(unittest.TestCase):
|
||||||
|
|
||||||
|
# Test data {{{
|
||||||
|
HTML_IDS = '''
|
||||||
|
<html id="html"><head>
|
||||||
|
<link id="link-href" href="foo" />
|
||||||
|
<link id="link-nohref" />
|
||||||
|
</head><body>
|
||||||
|
<div id="outer-div">
|
||||||
|
<a id="name-anchor" name="foo"></a>
|
||||||
|
<a id="tag-anchor" rel="tag" href="http://localhost/foo">link</a>
|
||||||
|
<a id="nofollow-anchor" rel="nofollow" href="https://example.org">
|
||||||
|
link</a>
|
||||||
|
<ol id="first-ol" class="a b c">
|
||||||
|
<li id="first-li">content</li>
|
||||||
|
<li id="second-li" lang="En-us">
|
||||||
|
<div id="li-div">
|
||||||
|
</div>
|
||||||
|
</li>
|
||||||
|
<li id="third-li" class="ab c"></li>
|
||||||
|
<li id="fourth-li" class="ab
|
||||||
|
c"></li>
|
||||||
|
<li id="fifth-li"></li>
|
||||||
|
<li id="sixth-li"></li>
|
||||||
|
<li id="seventh-li"> </li>
|
||||||
|
</ol>
|
||||||
|
<p id="paragraph">
|
||||||
|
<b id="p-b">hi</b> <em id="p-em">there</em>
|
||||||
|
<b id="p-b2">guy</b>
|
||||||
|
<input type="checkbox" id="checkbox-unchecked" />
|
||||||
|
<input type="checkbox" id="checkbox-disabled" disabled="" />
|
||||||
|
<input type="text" id="text-checked" checked="checked" />
|
||||||
|
<input type="hidden" />
|
||||||
|
<input type="hidden" disabled="disabled" />
|
||||||
|
<input type="checkbox" id="checkbox-checked" checked="checked" />
|
||||||
|
<input type="checkbox" id="checkbox-disabled-checked"
|
||||||
|
disabled="disabled" checked="checked" />
|
||||||
|
<fieldset id="fieldset" disabled="disabled">
|
||||||
|
<input type="checkbox" id="checkbox-fieldset-disabled" />
|
||||||
|
<input type="hidden" />
|
||||||
|
</fieldset>
|
||||||
|
</p>
|
||||||
|
<ol id="second-ol">
|
||||||
|
</ol>
|
||||||
|
<map name="dummymap">
|
||||||
|
<area shape="circle" coords="200,250,25" href="foo.html" id="area-href" />
|
||||||
|
<area shape="default" id="area-nohref" />
|
||||||
|
</map>
|
||||||
|
</div>
|
||||||
|
<div id="foobar-div" foobar="ab bc
|
||||||
|
cde"><span id="foobar-span"></span></div>
|
||||||
|
</body></html>
|
||||||
|
'''
|
||||||
|
HTML_SHAKESPEARE = '''
|
||||||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||||
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||||
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" debug="true">
|
||||||
|
<head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="test">
|
||||||
|
<div class="dialog">
|
||||||
|
<h2>As You Like It</h2>
|
||||||
|
<div id="playwright">
|
||||||
|
by William Shakespeare
|
||||||
|
</div>
|
||||||
|
<div class="dialog scene thirdClass" id="scene1">
|
||||||
|
<h3>ACT I, SCENE III. A room in the palace.</h3>
|
||||||
|
<div class="dialog">
|
||||||
|
<div class="direction">Enter CELIA and ROSALIND</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech1" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.1">Why, cousin! why, Rosalind! Cupid have mercy! not a word?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech2" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.2">Not one to throw at a dog.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech3" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.3">No, thy words are too precious to be cast away upon</div>
|
||||||
|
<div id="scene1.3.4">curs; throw some of them at me; come, lame me with reasons.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech4" class="character">ROSALIND</div>
|
||||||
|
<div id="speech5" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.8">But is all this for your father?</div>
|
||||||
|
</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.5">Then there were two cousins laid up; when the one</div>
|
||||||
|
<div id="scene1.3.6">should be lamed with reasons and the other mad</div>
|
||||||
|
<div id="scene1.3.7">without any.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech6" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.9">No, some of it is for my child's father. O, how</div>
|
||||||
|
<div id="scene1.3.10">full of briers is this working-day world!</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech7" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.11">They are but burs, cousin, thrown upon thee in</div>
|
||||||
|
<div id="scene1.3.12">holiday foolery: if we walk not in the trodden</div>
|
||||||
|
<div id="scene1.3.13">paths our very petticoats will catch them.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech8" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.14">I could shake them off my coat: these burs are in my heart.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech9" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.15">Hem them away.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech10" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.16">I would try, if I could cry 'hem' and have him.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech11" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.17">Come, come, wrestle with thy affections.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech12" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.18">O, they take the part of a better wrestler than myself!</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech13" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.19">O, a good wish upon you! you will try in time, in</div>
|
||||||
|
<div id="scene1.3.20">despite of a fall. But, turning these jests out of</div>
|
||||||
|
<div id="scene1.3.21">service, let us talk in good earnest: is it</div>
|
||||||
|
<div id="scene1.3.22">possible, on such a sudden, you should fall into so</div>
|
||||||
|
<div id="scene1.3.23">strong a liking with old Sir Rowland's youngest son?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech14" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.24">The duke my father loved his father dearly.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech15" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.25">Doth it therefore ensue that you should love his son</div>
|
||||||
|
<div id="scene1.3.26">dearly? By this kind of chase, I should hate him,</div>
|
||||||
|
<div id="scene1.3.27">for my father hated his father dearly; yet I hate</div>
|
||||||
|
<div id="scene1.3.28">not Orlando.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech16" class="character">ROSALIND</div>
|
||||||
|
<div title="wtf" class="dialog">
|
||||||
|
<div id="scene1.3.29">No, faith, hate him not, for my sake.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech17" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.30">Why should I not? doth he not deserve well?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech18" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.31">Let me love him for that, and do you love him</div>
|
||||||
|
<div id="scene1.3.32">because I do. Look, here comes the duke.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech19" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.33">With his eyes full of anger.</div>
|
||||||
|
<div class="direction">Enter DUKE FREDERICK, with Lords</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech20" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.34">Mistress, dispatch you with your safest haste</div>
|
||||||
|
<div id="scene1.3.35">And get you from our court.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech21" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.36">Me, uncle?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech22" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.37">You, cousin</div>
|
||||||
|
<div id="scene1.3.38">Within these ten days if that thou be'st found</div>
|
||||||
|
<div id="scene1.3.39">So near our public court as twenty miles,</div>
|
||||||
|
<div id="scene1.3.40">Thou diest for it.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech23" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.41"> I do beseech your grace,</div>
|
||||||
|
<div id="scene1.3.42">Let me the knowledge of my fault bear with me:</div>
|
||||||
|
<div id="scene1.3.43">If with myself I hold intelligence</div>
|
||||||
|
<div id="scene1.3.44">Or have acquaintance with mine own desires,</div>
|
||||||
|
<div id="scene1.3.45">If that I do not dream or be not frantic,--</div>
|
||||||
|
<div id="scene1.3.46">As I do trust I am not--then, dear uncle,</div>
|
||||||
|
<div id="scene1.3.47">Never so much as in a thought unborn</div>
|
||||||
|
<div id="scene1.3.48">Did I offend your highness.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech24" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.49">Thus do all traitors:</div>
|
||||||
|
<div id="scene1.3.50">If their purgation did consist in words,</div>
|
||||||
|
<div id="scene1.3.51">They are as innocent as grace itself:</div>
|
||||||
|
<div id="scene1.3.52">Let it suffice thee that I trust thee not.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech25" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.53">Yet your mistrust cannot make me a traitor:</div>
|
||||||
|
<div id="scene1.3.54">Tell me whereon the likelihood depends.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech26" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.55">Thou art thy father's daughter; there's enough.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech27" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.56">So was I when your highness took his dukedom;</div>
|
||||||
|
<div id="scene1.3.57">So was I when your highness banish'd him:</div>
|
||||||
|
<div id="scene1.3.58">Treason is not inherited, my lord;</div>
|
||||||
|
<div id="scene1.3.59">Or, if we did derive it from our friends,</div>
|
||||||
|
<div id="scene1.3.60">What's that to me? my father was no traitor:</div>
|
||||||
|
<div id="scene1.3.61">Then, good my liege, mistake me not so much</div>
|
||||||
|
<div id="scene1.3.62">To think my poverty is treacherous.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech28" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.63">Dear sovereign, hear me speak.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech29" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.64">Ay, Celia; we stay'd her for your sake,</div>
|
||||||
|
<div id="scene1.3.65">Else had she with her father ranged along.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech30" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.66">I did not then entreat to have her stay;</div>
|
||||||
|
<div id="scene1.3.67">It was your pleasure and your own remorse:</div>
|
||||||
|
<div id="scene1.3.68">I was too young that time to value her;</div>
|
||||||
|
<div id="scene1.3.69">But now I know her: if she be a traitor,</div>
|
||||||
|
<div id="scene1.3.70">Why so am I; we still have slept together,</div>
|
||||||
|
<div id="scene1.3.71">Rose at an instant, learn'd, play'd, eat together,</div>
|
||||||
|
<div id="scene1.3.72">And wheresoever we went, like Juno's swans,</div>
|
||||||
|
<div id="scene1.3.73">Still we went coupled and inseparable.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech31" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.74">She is too subtle for thee; and her smoothness,</div>
|
||||||
|
<div id="scene1.3.75">Her very silence and her patience</div>
|
||||||
|
<div id="scene1.3.76">Speak to the people, and they pity her.</div>
|
||||||
|
<div id="scene1.3.77">Thou art a fool: she robs thee of thy name;</div>
|
||||||
|
<div id="scene1.3.78">And thou wilt show more bright and seem more virtuous</div>
|
||||||
|
<div id="scene1.3.79">When she is gone. Then open not thy lips:</div>
|
||||||
|
<div id="scene1.3.80">Firm and irrevocable is my doom</div>
|
||||||
|
<div id="scene1.3.81">Which I have pass'd upon her; she is banish'd.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech32" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.82">Pronounce that sentence then on me, my liege:</div>
|
||||||
|
<div id="scene1.3.83">I cannot live out of her company.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech33" class="character">DUKE FREDERICK</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.84">You are a fool. You, niece, provide yourself:</div>
|
||||||
|
<div id="scene1.3.85">If you outstay the time, upon mine honour,</div>
|
||||||
|
<div id="scene1.3.86">And in the greatness of my word, you die.</div>
|
||||||
|
<div class="direction">Exeunt DUKE FREDERICK and Lords</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech34" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.87">O my poor Rosalind, whither wilt thou go?</div>
|
||||||
|
<div id="scene1.3.88">Wilt thou change fathers? I will give thee mine.</div>
|
||||||
|
<div id="scene1.3.89">I charge thee, be not thou more grieved than I am.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech35" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.90">I have more cause.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech36" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.91"> Thou hast not, cousin;</div>
|
||||||
|
<div id="scene1.3.92">Prithee be cheerful: know'st thou not, the duke</div>
|
||||||
|
<div id="scene1.3.93">Hath banish'd me, his daughter?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech37" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.94">That he hath not.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech38" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.95">No, hath not? Rosalind lacks then the love</div>
|
||||||
|
<div id="scene1.3.96">Which teacheth thee that thou and I am one:</div>
|
||||||
|
<div id="scene1.3.97">Shall we be sunder'd? shall we part, sweet girl?</div>
|
||||||
|
<div id="scene1.3.98">No: let my father seek another heir.</div>
|
||||||
|
<div id="scene1.3.99">Therefore devise with me how we may fly,</div>
|
||||||
|
<div id="scene1.3.100">Whither to go and what to bear with us;</div>
|
||||||
|
<div id="scene1.3.101">And do not seek to take your change upon you,</div>
|
||||||
|
<div id="scene1.3.102">To bear your griefs yourself and leave me out;</div>
|
||||||
|
<div id="scene1.3.103">For, by this heaven, now at our sorrows pale,</div>
|
||||||
|
<div id="scene1.3.104">Say what thou canst, I'll go along with thee.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech39" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.105">Why, whither shall we go?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech40" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.106">To seek my uncle in the forest of Arden.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech41" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.107">Alas, what danger will it be to us,</div>
|
||||||
|
<div id="scene1.3.108">Maids as we are, to travel forth so far!</div>
|
||||||
|
<div id="scene1.3.109">Beauty provoketh thieves sooner than gold.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech42" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.110">I'll put myself in poor and mean attire</div>
|
||||||
|
<div id="scene1.3.111">And with a kind of umber smirch my face;</div>
|
||||||
|
<div id="scene1.3.112">The like do you: so shall we pass along</div>
|
||||||
|
<div id="scene1.3.113">And never stir assailants.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech43" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.114">Were it not better,</div>
|
||||||
|
<div id="scene1.3.115">Because that I am more than common tall,</div>
|
||||||
|
<div id="scene1.3.116">That I did suit me all points like a man?</div>
|
||||||
|
<div id="scene1.3.117">A gallant curtle-axe upon my thigh,</div>
|
||||||
|
<div id="scene1.3.118">A boar-spear in my hand; and--in my heart</div>
|
||||||
|
<div id="scene1.3.119">Lie there what hidden woman's fear there will--</div>
|
||||||
|
<div id="scene1.3.120">We'll have a swashing and a martial outside,</div>
|
||||||
|
<div id="scene1.3.121">As many other mannish cowards have</div>
|
||||||
|
<div id="scene1.3.122">That do outface it with their semblances.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech44" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.123">What shall I call thee when thou art a man?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech45" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.124">I'll have no worse a name than Jove's own page;</div>
|
||||||
|
<div id="scene1.3.125">And therefore look you call me Ganymede.</div>
|
||||||
|
<div id="scene1.3.126">But what will you be call'd?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech46" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.127">Something that hath a reference to my state</div>
|
||||||
|
<div id="scene1.3.128">No longer Celia, but Aliena.</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech47" class="character">ROSALIND</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.129">But, cousin, what if we assay'd to steal</div>
|
||||||
|
<div id="scene1.3.130">The clownish fool out of your father's court?</div>
|
||||||
|
<div id="scene1.3.131">Would he not be a comfort to our travel?</div>
|
||||||
|
</div>
|
||||||
|
<div id="speech48" class="character">CELIA</div>
|
||||||
|
<div class="dialog">
|
||||||
|
<div id="scene1.3.132">He'll go along o'er the wide world with me;</div>
|
||||||
|
<div id="scene1.3.133">Leave me alone to woo him. Let's away,</div>
|
||||||
|
<div id="scene1.3.134">And get our jewels and our wealth together,</div>
|
||||||
|
<div id="scene1.3.135">Devise the fittest time and safest way</div>
|
||||||
|
<div id="scene1.3.136">To hide us from pursuit that will be made</div>
|
||||||
|
<div id="scene1.3.137">After my flight. Now go we in content</div>
|
||||||
|
<div id="scene1.3.138">To liberty and not to banishment.</div>
|
||||||
|
<div class="direction">Exeunt</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
ae = unittest.TestCase.assertEqual
|
||||||
|
|
||||||
|
def test_tokenizer(self): # {{{
|
||||||
|
tokens = [
|
||||||
|
type('')(item) for item in tokenize(
|
||||||
|
r'E\ é > f [a~="y\"x"]:nth(/* fu /]* */-3.7)')]
|
||||||
|
self.ae(tokens, [
|
||||||
|
"<IDENT 'E é' at 0>",
|
||||||
|
"<S ' ' at 4>",
|
||||||
|
"<DELIM '>' at 5>",
|
||||||
|
"<S ' ' at 6>",
|
||||||
|
# the no-break space is not whitespace in CSS
|
||||||
|
"<IDENT 'f ' at 7>", # f\xa0
|
||||||
|
"<DELIM '[' at 9>",
|
||||||
|
"<IDENT 'a' at 10>",
|
||||||
|
"<DELIM '~' at 11>",
|
||||||
|
"<DELIM '=' at 12>",
|
||||||
|
"<STRING 'y\"x' at 13>",
|
||||||
|
"<DELIM ']' at 19>",
|
||||||
|
"<DELIM ':' at 20>",
|
||||||
|
"<IDENT 'nth' at 21>",
|
||||||
|
"<DELIM '(' at 24>",
|
||||||
|
"<NUMBER '-3.7' at 37>",
|
||||||
|
"<DELIM ')' at 41>",
|
||||||
|
"<EOF at 42>",
|
||||||
|
])
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def test_parser(self): # {{{
|
||||||
|
def repr_parse(css):
|
||||||
|
selectors = parse(css)
|
||||||
|
for selector in selectors:
|
||||||
|
assert selector.pseudo_element is None
|
||||||
|
return [repr(selector.parsed_tree).replace("(u'", "('")
|
||||||
|
for selector in selectors]
|
||||||
|
|
||||||
|
def parse_many(first, *others):
|
||||||
|
result = repr_parse(first)
|
||||||
|
for other in others:
|
||||||
|
assert repr_parse(other) == result
|
||||||
|
return result
|
||||||
|
|
||||||
|
assert parse_many('*') == ['Element[*]']
|
||||||
|
assert parse_many('*|*') == ['Element[*]']
|
||||||
|
assert parse_many('*|foo') == ['Element[foo]']
|
||||||
|
assert parse_many('foo|*') == ['Element[foo|*]']
|
||||||
|
assert parse_many('foo|bar') == ['Element[foo|bar]']
|
||||||
|
# This will never match, but it is valid:
|
||||||
|
assert parse_many('#foo#bar') == ['Hash[Hash[Element[*]#foo]#bar]']
|
||||||
|
assert parse_many(
|
||||||
|
'div>.foo',
|
||||||
|
'div> .foo',
|
||||||
|
'div >.foo',
|
||||||
|
'div > .foo',
|
||||||
|
'div \n> \t \t .foo', 'div\r>\n\n\n.foo', 'div\f>\f.foo'
|
||||||
|
) == ['CombinedSelector[Element[div] > Class[Element[*].foo]]']
|
||||||
|
assert parse_many('td.foo,.bar',
|
||||||
|
'td.foo, .bar',
|
||||||
|
'td.foo\t\r\n\f ,\t\r\n\f .bar'
|
||||||
|
) == [
|
||||||
|
'Class[Element[td].foo]',
|
||||||
|
'Class[Element[*].bar]'
|
||||||
|
]
|
||||||
|
assert parse_many('div, td.foo, div.bar span') == [
|
||||||
|
'Element[div]',
|
||||||
|
'Class[Element[td].foo]',
|
||||||
|
'CombinedSelector[Class[Element[div].bar] '
|
||||||
|
'<followed> Element[span]]']
|
||||||
|
assert parse_many('div > p') == [
|
||||||
|
'CombinedSelector[Element[div] > Element[p]]']
|
||||||
|
assert parse_many('td:first') == [
|
||||||
|
'Pseudo[Element[td]:first]']
|
||||||
|
assert parse_many('td:first') == [
|
||||||
|
'Pseudo[Element[td]:first]']
|
||||||
|
assert parse_many('td :first') == [
|
||||||
|
'CombinedSelector[Element[td] '
|
||||||
|
'<followed> Pseudo[Element[*]:first]]']
|
||||||
|
assert parse_many('td :first') == [
|
||||||
|
'CombinedSelector[Element[td] '
|
||||||
|
'<followed> Pseudo[Element[*]:first]]']
|
||||||
|
assert parse_many('a[name]', 'a[ name\t]') == [
|
||||||
|
'Attrib[Element[a][name]]']
|
||||||
|
assert parse_many('a [name]') == [
|
||||||
|
'CombinedSelector[Element[a] <followed> Attrib[Element[*][name]]]']
|
||||||
|
self.ae(parse_many('a[rel="include"]', 'a[rel = include]'), [
|
||||||
|
"Attrib[Element[a][rel = 'include']]"])
|
||||||
|
assert parse_many("a[hreflang |= 'en']", "a[hreflang|=en]") == [
|
||||||
|
"Attrib[Element[a][hreflang |= 'en']]"]
|
||||||
|
self.ae(parse_many('div:nth-child(10)'), [
|
||||||
|
"Function[Element[div]:nth-child(['10'])]"])
|
||||||
|
assert parse_many(':nth-child(2n+2)') == [
|
||||||
|
"Function[Element[*]:nth-child(['2', 'n', '+2'])]"]
|
||||||
|
assert parse_many('div:nth-of-type(10)') == [
|
||||||
|
"Function[Element[div]:nth-of-type(['10'])]"]
|
||||||
|
assert parse_many('div div:nth-of-type(10) .aclass') == [
|
||||||
|
'CombinedSelector[CombinedSelector[Element[div] <followed> '
|
||||||
|
"Function[Element[div]:nth-of-type(['10'])]] "
|
||||||
|
'<followed> Class[Element[*].aclass]]']
|
||||||
|
assert parse_many('label:only') == [
|
||||||
|
'Pseudo[Element[label]:only]']
|
||||||
|
assert parse_many('a:lang(fr)') == [
|
||||||
|
"Function[Element[a]:lang(['fr'])]"]
|
||||||
|
assert parse_many('div:contains("foo")') == [
|
||||||
|
"Function[Element[div]:contains(['foo'])]"]
|
||||||
|
assert parse_many('div#foobar') == [
|
||||||
|
'Hash[Element[div]#foobar]']
|
||||||
|
assert parse_many('div:not(div.foo)') == [
|
||||||
|
'Negation[Element[div]:not(Class[Element[div].foo])]']
|
||||||
|
assert parse_many('td ~ th') == [
|
||||||
|
'CombinedSelector[Element[td] ~ Element[th]]']
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def test_pseudo_elements(self): # {{{
|
||||||
|
def parse_pseudo(css):
|
||||||
|
result = []
|
||||||
|
for selector in parse(css):
|
||||||
|
pseudo = selector.pseudo_element
|
||||||
|
pseudo = type('')(pseudo) if pseudo else pseudo
|
||||||
|
# No Symbol here
|
||||||
|
assert pseudo is None or isinstance(pseudo, type(''))
|
||||||
|
selector = repr(selector.parsed_tree).replace("(u'", "('")
|
||||||
|
result.append((selector, pseudo))
|
||||||
|
return result
|
||||||
|
|
||||||
|
def parse_one(css):
|
||||||
|
result = parse_pseudo(css)
|
||||||
|
assert len(result) == 1
|
||||||
|
return result[0]
|
||||||
|
|
||||||
|
self.ae(parse_one('foo'), ('Element[foo]', None))
|
||||||
|
self.ae(parse_one('*'), ('Element[*]', None))
|
||||||
|
self.ae(parse_one(':empty'), ('Pseudo[Element[*]:empty]', None))
|
||||||
|
|
||||||
|
# Special cases for CSS 2.1 pseudo-elements
|
||||||
|
self.ae(parse_one(':BEfore'), ('Element[*]', 'before'))
|
||||||
|
self.ae(parse_one(':aftER'), ('Element[*]', 'after'))
|
||||||
|
self.ae(parse_one(':First-Line'), ('Element[*]', 'first-line'))
|
||||||
|
self.ae(parse_one(':First-Letter'), ('Element[*]', 'first-letter'))
|
||||||
|
|
||||||
|
self.ae(parse_one('::befoRE'), ('Element[*]', 'before'))
|
||||||
|
self.ae(parse_one('::AFter'), ('Element[*]', 'after'))
|
||||||
|
self.ae(parse_one('::firsT-linE'), ('Element[*]', 'first-line'))
|
||||||
|
self.ae(parse_one('::firsT-letteR'), ('Element[*]', 'first-letter'))
|
||||||
|
|
||||||
|
self.ae(parse_one('::text-content'), ('Element[*]', 'text-content'))
|
||||||
|
self.ae(parse_one('::attr(name)'), (
|
||||||
|
"Element[*]", "FunctionalPseudoElement[::attr(['name'])]"))
|
||||||
|
|
||||||
|
self.ae(parse_one('::Selection'), ('Element[*]', 'selection'))
|
||||||
|
self.ae(parse_one('foo:after'), ('Element[foo]', 'after'))
|
||||||
|
self.ae(parse_one('foo::selection'), ('Element[foo]', 'selection'))
|
||||||
|
self.ae(parse_one('lorem#ipsum ~ a#b.c[href]:empty::selection'), (
|
||||||
|
'CombinedSelector[Hash[Element[lorem]#ipsum] ~ '
|
||||||
|
'Pseudo[Attrib[Class[Hash[Element[a]#b].c][href]]:empty]]',
|
||||||
|
'selection'))
|
||||||
|
|
||||||
|
parse_pseudo('foo:before, bar, baz:after') == [
|
||||||
|
('Element[foo]', 'before'),
|
||||||
|
('Element[bar]', None),
|
||||||
|
('Element[baz]', 'after')]
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def test_specificity(self): # {{{
|
||||||
|
def specificity(css):
|
||||||
|
selectors = parse(css)
|
||||||
|
assert len(selectors) == 1
|
||||||
|
return selectors[0].specificity()
|
||||||
|
|
||||||
|
assert specificity('*') == (0, 0, 0)
|
||||||
|
assert specificity(' foo') == (0, 0, 1)
|
||||||
|
assert specificity(':empty ') == (0, 1, 0)
|
||||||
|
assert specificity(':before') == (0, 0, 1)
|
||||||
|
assert specificity('*:before') == (0, 0, 1)
|
||||||
|
assert specificity(':nth-child(2)') == (0, 1, 0)
|
||||||
|
assert specificity('.bar') == (0, 1, 0)
|
||||||
|
assert specificity('[baz]') == (0, 1, 0)
|
||||||
|
assert specificity('[baz="4"]') == (0, 1, 0)
|
||||||
|
assert specificity('[baz^="4"]') == (0, 1, 0)
|
||||||
|
assert specificity('#lipsum') == (1, 0, 0)
|
||||||
|
|
||||||
|
assert specificity(':not(*)') == (0, 0, 0)
|
||||||
|
assert specificity(':not(foo)') == (0, 0, 1)
|
||||||
|
assert specificity(':not(.foo)') == (0, 1, 0)
|
||||||
|
assert specificity(':not([foo])') == (0, 1, 0)
|
||||||
|
assert specificity(':not(:empty)') == (0, 1, 0)
|
||||||
|
assert specificity(':not(#foo)') == (1, 0, 0)
|
||||||
|
|
||||||
|
assert specificity('foo:empty') == (0, 1, 1)
|
||||||
|
assert specificity('foo:before') == (0, 0, 2)
|
||||||
|
assert specificity('foo::before') == (0, 0, 2)
|
||||||
|
assert specificity('foo:empty::before') == (0, 1, 2)
|
||||||
|
|
||||||
|
assert specificity('#lorem + foo#ipsum:first-child > bar:first-line'
|
||||||
|
) == (2, 1, 3)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def test_parse_errors(self): # {{{
|
||||||
|
def get_error(css):
|
||||||
|
try:
|
||||||
|
parse(css)
|
||||||
|
except SelectorSyntaxError:
|
||||||
|
# Py2, Py3, ...
|
||||||
|
return str(sys.exc_info()[1]).replace("(u'", "('")
|
||||||
|
|
||||||
|
self.ae(get_error('attributes(href)/html/body/a'), (
|
||||||
|
"Expected selector, got <DELIM '(' at 10>"))
|
||||||
|
assert get_error('attributes(href)') == (
|
||||||
|
"Expected selector, got <DELIM '(' at 10>")
|
||||||
|
assert get_error('html/body/a') == (
|
||||||
|
"Expected selector, got <DELIM '/' at 4>")
|
||||||
|
assert get_error(' ') == (
|
||||||
|
"Expected selector, got <EOF at 1>")
|
||||||
|
assert get_error('div, ') == (
|
||||||
|
"Expected selector, got <EOF at 5>")
|
||||||
|
assert get_error(' , div') == (
|
||||||
|
"Expected selector, got <DELIM ',' at 1>")
|
||||||
|
assert get_error('p, , div') == (
|
||||||
|
"Expected selector, got <DELIM ',' at 3>")
|
||||||
|
assert get_error('div > ') == (
|
||||||
|
"Expected selector, got <EOF at 6>")
|
||||||
|
assert get_error(' > div') == (
|
||||||
|
"Expected selector, got <DELIM '>' at 2>")
|
||||||
|
assert get_error('foo|#bar') == (
|
||||||
|
"Expected ident or '*', got <HASH 'bar' at 4>")
|
||||||
|
assert get_error('#.foo') == (
|
||||||
|
"Expected selector, got <DELIM '#' at 0>")
|
||||||
|
assert get_error('.#foo') == (
|
||||||
|
"Expected ident, got <HASH 'foo' at 1>")
|
||||||
|
assert get_error(':#foo') == (
|
||||||
|
"Expected ident, got <HASH 'foo' at 1>")
|
||||||
|
assert get_error('[*]') == (
|
||||||
|
"Expected '|', got <DELIM ']' at 2>")
|
||||||
|
assert get_error('[foo|]') == (
|
||||||
|
"Expected ident, got <DELIM ']' at 5>")
|
||||||
|
assert get_error('[#]') == (
|
||||||
|
"Expected ident or '*', got <DELIM '#' at 1>")
|
||||||
|
assert get_error('[foo=#]') == (
|
||||||
|
"Expected string or ident, got <DELIM '#' at 5>")
|
||||||
|
assert get_error('[href]a') == (
|
||||||
|
"Expected selector, got <IDENT 'a' at 6>")
|
||||||
|
assert get_error('[rel=stylesheet]') is None
|
||||||
|
assert get_error('[rel:stylesheet]') == (
|
||||||
|
"Operator expected, got <DELIM ':' at 4>")
|
||||||
|
assert get_error('[rel=stylesheet') == (
|
||||||
|
"Expected ']', got <EOF at 15>")
|
||||||
|
assert get_error(':lang(fr)') is None
|
||||||
|
assert get_error(':lang(fr') == (
|
||||||
|
"Expected an argument, got <EOF at 8>")
|
||||||
|
assert get_error(':contains("foo') == (
|
||||||
|
"Unclosed string at 10")
|
||||||
|
assert get_error('foo!') == (
|
||||||
|
"Expected selector, got <DELIM '!' at 3>")
|
||||||
|
|
||||||
|
# Mis-placed pseudo-elements
|
||||||
|
assert get_error('a:before:empty') == (
|
||||||
|
"Got pseudo-element ::before not at the end of a selector")
|
||||||
|
assert get_error('li:before a') == (
|
||||||
|
"Got pseudo-element ::before not at the end of a selector")
|
||||||
|
assert get_error(':not(:before)') == (
|
||||||
|
"Got pseudo-element ::before inside :not() at 12")
|
||||||
|
assert get_error(':not(:not(a))') == (
|
||||||
|
"Got nested :not()")
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def test_select(self): # {{{
|
||||||
|
document = etree.fromstring(self.HTML_IDS, parser=etree.XMLParser(recover=True, no_network=True, resolve_entities=False))
|
||||||
|
select = Select(document)
|
||||||
|
|
||||||
|
def select_ids(selector):
|
||||||
|
for elem in select(selector):
|
||||||
|
yield elem.get('id')
|
||||||
|
|
||||||
|
def pcss(main, *selectors, **kwargs):
|
||||||
|
result = list(select_ids(main))
|
||||||
|
for selector in selectors:
|
||||||
|
self.ae(list(select_ids(selector)), result)
|
||||||
|
return result
|
||||||
|
all_ids = pcss('*')
|
||||||
|
self.ae(all_ids[:6], [
|
||||||
|
'html', None, 'link-href', 'link-nohref', None, 'outer-div'])
|
||||||
|
self.ae(all_ids[-1:], ['foobar-span'])
|
||||||
|
self.ae(pcss('div'), ['outer-div', 'li-div', 'foobar-div'])
|
||||||
|
self.ae(pcss('DIV'), [
|
||||||
|
'outer-div', 'li-div', 'foobar-div']) # case-insensitive in HTML
|
||||||
|
self.ae(pcss('div div'), ['li-div'])
|
||||||
|
self.ae(pcss('div, div div'), ['outer-div', 'li-div', 'foobar-div'])
|
||||||
|
self.ae(pcss('a[name]'), ['name-anchor'])
|
||||||
|
self.ae(pcss('a[NAme]'), ['name-anchor']) # case-insensitive in HTML:
|
||||||
|
self.ae(pcss('a[rel]'), ['tag-anchor', 'nofollow-anchor'])
|
||||||
|
self.ae(pcss('a[rel="tag"]'), ['tag-anchor'])
|
||||||
|
self.ae(pcss('a[href*="localhost"]'), ['tag-anchor'])
|
||||||
|
self.ae(pcss('a[href*=""]'), [])
|
||||||
|
self.ae(pcss('a[href^="http"]'), ['tag-anchor', 'nofollow-anchor'])
|
||||||
|
self.ae(pcss('a[href^="http:"]'), ['tag-anchor'])
|
||||||
|
self.ae(pcss('a[href^=""]'), [])
|
||||||
|
self.ae(pcss('a[href$="org"]'), ['nofollow-anchor'])
|
||||||
|
self.ae(pcss('a[href$=""]'), [])
|
||||||
|
self.ae(pcss('div[foobar~="bc"]', 'div[foobar~="cde"]', skip_webkit=True), ['foobar-div'])
|
||||||
|
self.ae(pcss('[foobar~="ab bc"]', '[foobar~=""]', '[foobar~=" \t"]'), [])
|
||||||
|
self.ae(pcss('div[foobar~="cd"]'), [])
|
||||||
|
self.ae(pcss('*[lang|="En"]', '[lang|="En-us"]'), ['second-li'])
|
||||||
|
# Attribute values are case sensitive
|
||||||
|
self.ae(pcss('*[lang|="en"]', '[lang|="en-US"]', skip_webkit=True), [])
|
||||||
|
self.ae(pcss('*[lang|="e"]'), [])
|
||||||
|
self.ae(pcss(':lang("EN")', '*:lang(en-US)', skip_webkit=True), ['second-li', 'li-div'])
|
||||||
|
self.ae(pcss(':lang("e")'), [])
|
||||||
|
self.ae(pcss('li:nth-child(1)', 'li:first-child'), ['first-li'])
|
||||||
|
self.ae(pcss('li:nth-child(3)', '#first-li ~ :nth-child(3)'), ['third-li'])
|
||||||
|
self.ae(pcss('li:nth-child(10)'), [])
|
||||||
|
self.ae(pcss('li:nth-child(2n)', 'li:nth-child(even)', 'li:nth-child(2n+0)'), ['second-li', 'fourth-li', 'sixth-li'])
|
||||||
|
self.ae(pcss('li:nth-child(+2n+1)', 'li:nth-child(odd)'), ['first-li', 'third-li', 'fifth-li', 'seventh-li'])
|
||||||
|
self.ae(pcss('li:nth-child(2n+4)'), ['fourth-li', 'sixth-li'])
|
||||||
|
self.ae(pcss('li:nth-child(3n+1)'), ['first-li', 'fourth-li', 'seventh-li'])
|
||||||
|
self.ae(pcss('li:nth-last-child(0)'), [])
|
||||||
|
self.ae(pcss('li:nth-last-child(1)', 'li:last-child'), ['seventh-li'])
|
||||||
|
self.ae(pcss('li:nth-last-child(2n)', 'li:nth-last-child(even)'), ['second-li', 'fourth-li', 'sixth-li'])
|
||||||
|
self.ae(pcss('li:nth-last-child(2n+2)'), ['second-li', 'fourth-li', 'sixth-li'])
|
||||||
|
self.ae(pcss('ol:first-of-type'), ['first-ol'])
|
||||||
|
self.ae(pcss('ol:nth-child(1)'), [])
|
||||||
|
self.ae(pcss('ol:nth-of-type(2)'), ['second-ol'])
|
||||||
|
self.ae(pcss('ol:nth-last-of-type(1)'), ['second-ol'])
|
||||||
|
self.ae(pcss('span:only-child'), ['foobar-span'])
|
||||||
|
self.ae(pcss('li div:only-child'), ['li-div'])
|
||||||
|
self.ae(pcss('div *:only-child'), ['li-div', 'foobar-span'])
|
||||||
|
self.ae(pcss('p *:only-of-type', skip_webkit=True), ['p-em', 'fieldset'])
|
||||||
|
self.ae(pcss('p:only-of-type', skip_webkit=True), ['paragraph'])
|
||||||
|
self.ae(pcss('a:empty', 'a:EMpty'), ['name-anchor'])
|
||||||
|
self.ae(pcss('li:empty'), ['third-li', 'fourth-li', 'fifth-li', 'sixth-li'])
|
||||||
|
self.ae(pcss(':root', 'html:root', 'li:root'), ['html'])
|
||||||
|
self.ae(pcss('* :root', 'p *:root'), [])
|
||||||
|
self.ae(pcss('.a', '.b', '*.a', 'ol.a'), ['first-ol'])
|
||||||
|
self.ae(pcss('.c', '*.c'), ['first-ol', 'third-li', 'fourth-li'])
|
||||||
|
self.ae(pcss('ol *.c', 'ol li.c', 'li ~ li.c', 'ol > li.c'), [
|
||||||
|
'third-li', 'fourth-li'])
|
||||||
|
self.ae(pcss('#first-li', 'li#first-li', '*#first-li'), ['first-li'])
|
||||||
|
self.ae(pcss('li div', 'li > div', 'div div'), ['li-div'])
|
||||||
|
self.ae(pcss('div > div'), [])
|
||||||
|
self.ae(pcss('div>.c', 'div > .c'), ['first-ol'])
|
||||||
|
self.ae(pcss('div + div'), ['foobar-div'])
|
||||||
|
self.ae(pcss('a ~ a'), ['tag-anchor', 'nofollow-anchor'])
|
||||||
|
self.ae(pcss('a[rel="tag"] ~ a'), ['nofollow-anchor'])
|
||||||
|
self.ae(pcss('ol#first-ol li:last-child'), ['seventh-li'])
|
||||||
|
self.ae(pcss('ol#first-ol *:last-child'), ['li-div', 'seventh-li'])
|
||||||
|
self.ae(pcss('#outer-div:first-child'), ['outer-div'])
|
||||||
|
self.ae(pcss('#outer-div :first-child'), [
|
||||||
|
'name-anchor', 'first-li', 'li-div', 'p-b',
|
||||||
|
'checkbox-fieldset-disabled', 'area-href'])
|
||||||
|
self.ae(pcss('a[href]'), ['tag-anchor', 'nofollow-anchor'])
|
||||||
|
self.ae(pcss(':not(*)'), [])
|
||||||
|
self.ae(pcss('a:not([href])'), ['name-anchor'])
|
||||||
|
self.ae(pcss('ol :Not(li[class])', skip_webkit=True), [
|
||||||
|
'first-li', 'second-li', 'li-div',
|
||||||
|
'fifth-li', 'sixth-li', 'seventh-li'])
|
||||||
|
self.ae(pcss(r'di\a0 v', r'div\['), [])
|
||||||
|
self.ae(pcss(r'[h\a0 ref]', r'[h\]ref]'), [])
|
||||||
|
|
||||||
|
self.assertRaises(ExpressionError, lambda : tuple(select('body:nth-child')))
|
||||||
|
|
||||||
|
select = Select(document, ignore_inappropriate_pseudo_classes=True)
|
||||||
|
self.assertGreater(len(tuple(select('p:hover'))), 0)
|
||||||
|
|
||||||
|
def test_select_shakespeare(self):
|
||||||
|
document = html.document_fromstring(self.HTML_SHAKESPEARE)
|
||||||
|
select = Select(document)
|
||||||
|
count = lambda s: sum(1 for r in select(s))
|
||||||
|
|
||||||
|
# Data borrowed from http://mootools.net/slickspeed/
|
||||||
|
|
||||||
|
# Changed from original; probably because I'm only
|
||||||
|
self.ae(count('*'), 249)
|
||||||
|
assert count('div:only-child') == 22 # ?
|
||||||
|
assert count('div:nth-child(even)') == 106
|
||||||
|
assert count('div:nth-child(2n)') == 106
|
||||||
|
assert count('div:nth-child(odd)') == 137
|
||||||
|
assert count('div:nth-child(2n+1)') == 137
|
||||||
|
assert count('div:nth-child(n)') == 243
|
||||||
|
assert count('div:last-child') == 53
|
||||||
|
assert count('div:first-child') == 51
|
||||||
|
assert count('div > div') == 242
|
||||||
|
assert count('div + div') == 190
|
||||||
|
assert count('div ~ div') == 190
|
||||||
|
assert count('body') == 1
|
||||||
|
assert count('body div') == 243
|
||||||
|
assert count('div') == 243
|
||||||
|
assert count('div div') == 242
|
||||||
|
assert count('div div div') == 241
|
||||||
|
assert count('div, div, div') == 243
|
||||||
|
assert count('div, a, span') == 243
|
||||||
|
assert count('.dialog') == 51
|
||||||
|
assert count('div.dialog') == 51
|
||||||
|
assert count('div .dialog') == 51
|
||||||
|
assert count('div.character, div.dialog') == 99
|
||||||
|
assert count('div.direction.dialog') == 0
|
||||||
|
assert count('div.dialog.direction') == 0
|
||||||
|
assert count('div.dialog.scene') == 1
|
||||||
|
assert count('div.scene.scene') == 1
|
||||||
|
assert count('div.scene .scene') == 0
|
||||||
|
assert count('div.direction .dialog ') == 0
|
||||||
|
assert count('div .dialog .direction') == 4
|
||||||
|
assert count('div.dialog .dialog .direction') == 4
|
||||||
|
assert count('#speech5') == 1
|
||||||
|
assert count('div#speech5') == 1
|
||||||
|
assert count('div #speech5') == 1
|
||||||
|
assert count('div.scene div.dialog') == 49
|
||||||
|
assert count('div#scene1 div.dialog div') == 142
|
||||||
|
assert count('#scene1 #speech1') == 1
|
||||||
|
assert count('div[class]') == 103
|
||||||
|
assert count('div[class=dialog]') == 50
|
||||||
|
assert count('div[class^=dia]') == 51
|
||||||
|
assert count('div[class$=log]') == 50
|
||||||
|
assert count('div[class*=sce]') == 1
|
||||||
|
assert count('div[class|=dialog]') == 50 # ? Seems right
|
||||||
|
assert count('div[class~=dialog]') == 51 # ? Seems right
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
# Run tests {{{
|
||||||
|
def find_tests():
|
||||||
|
return unittest.defaultTestLoader.loadTestsFromTestCase(TestCSSSelectors)
|
||||||
|
|
||||||
|
|
||||||
|
def run_tests(find_tests=find_tests, for_build=False):
|
||||||
|
if not for_build:
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument('name', nargs='?', default=None,
|
||||||
|
help='The name of the test to run')
|
||||||
|
args = parser.parse_args()
|
||||||
|
if not for_build and args.name and args.name.startswith('.'):
|
||||||
|
tests = find_tests()
|
||||||
|
q = args.name[1:]
|
||||||
|
if not q.startswith('test_'):
|
||||||
|
q = 'test_' + q
|
||||||
|
ans = None
|
||||||
|
try:
|
||||||
|
for test in tests:
|
||||||
|
if test._testMethodName == q:
|
||||||
|
ans = test
|
||||||
|
raise StopIteration()
|
||||||
|
except StopIteration:
|
||||||
|
pass
|
||||||
|
if ans is None:
|
||||||
|
print('No test named %s found' % args.name)
|
||||||
|
raise SystemExit(1)
|
||||||
|
tests = ans
|
||||||
|
else:
|
||||||
|
tests = unittest.defaultTestLoader.loadTestsFromName(args.name) if not for_build and args.name else find_tests()
|
||||||
|
r = unittest.TextTestRunner
|
||||||
|
if for_build:
|
||||||
|
r = r(verbosity=0, buffer=True, failfast=True)
|
||||||
|
else:
|
||||||
|
r = r(verbosity=4)
|
||||||
|
result = r.run(tests)
|
||||||
|
if for_build and result.errors or result.failures:
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
run_tests()
|
||||||
|
# }}}
|
||||||
759
ebook_converter/customize/__init__.py
Normal file
759
ebook_converter/customize/__init__.py
Normal file
@@ -0,0 +1,759 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os, sys, zipfile, importlib
|
||||||
|
|
||||||
|
from calibre.constants import numeric_version, iswindows, isosx
|
||||||
|
from calibre.ptempfile import PersistentTemporaryFile
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
platform = 'linux'
|
||||||
|
if iswindows:
|
||||||
|
platform = 'windows'
|
||||||
|
elif isosx:
|
||||||
|
platform = 'osx'
|
||||||
|
|
||||||
|
|
||||||
|
class PluginNotFound(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class InvalidPlugin(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class Plugin(object): # {{{
|
||||||
|
'''
|
||||||
|
A calibre plugin. Useful members include:
|
||||||
|
|
||||||
|
* ``self.plugin_path``: Stores path to the ZIP file that contains
|
||||||
|
this plugin or None if it is a builtin
|
||||||
|
plugin
|
||||||
|
* ``self.site_customization``: Stores a customization string entered
|
||||||
|
by the user.
|
||||||
|
|
||||||
|
Methods that should be overridden in sub classes:
|
||||||
|
|
||||||
|
* :meth:`initialize`
|
||||||
|
* :meth:`customization_help`
|
||||||
|
|
||||||
|
Useful methods:
|
||||||
|
|
||||||
|
* :meth:`temporary_file`
|
||||||
|
* :meth:`__enter__`
|
||||||
|
* :meth:`load_resources`
|
||||||
|
|
||||||
|
'''
|
||||||
|
#: List of platforms this plugin works on.
|
||||||
|
#: For example: ``['windows', 'osx', 'linux']``
|
||||||
|
supported_platforms = []
|
||||||
|
|
||||||
|
#: The name of this plugin. You must set it something other
|
||||||
|
#: than Trivial Plugin for it to work.
|
||||||
|
name = 'Trivial Plugin'
|
||||||
|
|
||||||
|
#: The version of this plugin as a 3-tuple (major, minor, revision)
|
||||||
|
version = (1, 0, 0)
|
||||||
|
|
||||||
|
#: A short string describing what this plugin does
|
||||||
|
description = _('Does absolutely nothing')
|
||||||
|
|
||||||
|
#: The author of this plugin
|
||||||
|
author = _('Unknown')
|
||||||
|
|
||||||
|
#: When more than one plugin exists for a filetype,
|
||||||
|
#: the plugins are run in order of decreasing priority.
|
||||||
|
#: Plugins with higher priority will be run first.
|
||||||
|
#: The highest possible priority is ``sys.maxsize``.
|
||||||
|
#: Default priority is 1.
|
||||||
|
priority = 1
|
||||||
|
|
||||||
|
#: The earliest version of calibre this plugin requires
|
||||||
|
minimum_calibre_version = (0, 4, 118)
|
||||||
|
|
||||||
|
#: If False, the user will not be able to disable this plugin. Use with
|
||||||
|
#: care.
|
||||||
|
can_be_disabled = True
|
||||||
|
|
||||||
|
#: The type of this plugin. Used for categorizing plugins in the
|
||||||
|
#: GUI
|
||||||
|
type = _('Base')
|
||||||
|
|
||||||
|
def __init__(self, plugin_path):
|
||||||
|
self.plugin_path = plugin_path
|
||||||
|
self.site_customization = None
|
||||||
|
|
||||||
|
def initialize(self):
|
||||||
|
'''
|
||||||
|
Called once when calibre plugins are initialized. Plugins are
|
||||||
|
re-initialized every time a new plugin is added. Also note that if the
|
||||||
|
plugin is run in a worker process, such as for adding books, then the
|
||||||
|
plugin will be initialized for every new worker process.
|
||||||
|
|
||||||
|
Perform any plugin specific initialization here, such as extracting
|
||||||
|
resources from the plugin ZIP file. The path to the ZIP file is
|
||||||
|
available as ``self.plugin_path``.
|
||||||
|
|
||||||
|
Note that ``self.site_customization`` is **not** available at this point.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def config_widget(self):
|
||||||
|
'''
|
||||||
|
Implement this method and :meth:`save_settings` in your plugin to
|
||||||
|
use a custom configuration dialog, rather then relying on the simple
|
||||||
|
string based default customization.
|
||||||
|
|
||||||
|
This method, if implemented, must return a QWidget. The widget can have
|
||||||
|
an optional method validate() that takes no arguments and is called
|
||||||
|
immediately after the user clicks OK. Changes are applied if and only
|
||||||
|
if the method returns True.
|
||||||
|
|
||||||
|
If for some reason you cannot perform the configuration at this time,
|
||||||
|
return a tuple of two strings (message, details), these will be
|
||||||
|
displayed as a warning dialog to the user and the process will be
|
||||||
|
aborted.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def save_settings(self, config_widget):
|
||||||
|
'''
|
||||||
|
Save the settings specified by the user with config_widget.
|
||||||
|
|
||||||
|
:param config_widget: The widget returned by :meth:`config_widget`.
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def do_user_config(self, parent=None):
|
||||||
|
'''
|
||||||
|
This method shows a configuration dialog for this plugin. It returns
|
||||||
|
True if the user clicks OK, False otherwise. The changes are
|
||||||
|
automatically applied.
|
||||||
|
'''
|
||||||
|
from PyQt5.Qt import QDialog, QDialogButtonBox, QVBoxLayout, \
|
||||||
|
QLabel, Qt, QLineEdit
|
||||||
|
from calibre.gui2 import gprefs
|
||||||
|
|
||||||
|
prefname = 'plugin config dialog:'+self.type + ':' + self.name
|
||||||
|
geom = gprefs.get(prefname, None)
|
||||||
|
|
||||||
|
config_dialog = QDialog(parent)
|
||||||
|
button_box = QDialogButtonBox(QDialogButtonBox.Ok | QDialogButtonBox.Cancel)
|
||||||
|
v = QVBoxLayout(config_dialog)
|
||||||
|
|
||||||
|
def size_dialog():
|
||||||
|
if geom is None:
|
||||||
|
config_dialog.resize(config_dialog.sizeHint())
|
||||||
|
else:
|
||||||
|
from PyQt5.Qt import QApplication
|
||||||
|
QApplication.instance().safe_restore_geometry(config_dialog, geom)
|
||||||
|
|
||||||
|
button_box.accepted.connect(config_dialog.accept)
|
||||||
|
button_box.rejected.connect(config_dialog.reject)
|
||||||
|
config_dialog.setWindowTitle(_('Customize') + ' ' + self.name)
|
||||||
|
try:
|
||||||
|
config_widget = self.config_widget()
|
||||||
|
except NotImplementedError:
|
||||||
|
config_widget = None
|
||||||
|
|
||||||
|
if isinstance(config_widget, tuple):
|
||||||
|
from calibre.gui2 import warning_dialog
|
||||||
|
warning_dialog(parent, _('Cannot configure'), config_widget[0],
|
||||||
|
det_msg=config_widget[1], show=True)
|
||||||
|
return False
|
||||||
|
|
||||||
|
if config_widget is not None:
|
||||||
|
v.addWidget(config_widget)
|
||||||
|
v.addWidget(button_box)
|
||||||
|
size_dialog()
|
||||||
|
config_dialog.exec_()
|
||||||
|
|
||||||
|
if config_dialog.result() == QDialog.Accepted:
|
||||||
|
if hasattr(config_widget, 'validate'):
|
||||||
|
if config_widget.validate():
|
||||||
|
self.save_settings(config_widget)
|
||||||
|
else:
|
||||||
|
self.save_settings(config_widget)
|
||||||
|
else:
|
||||||
|
from calibre.customize.ui import plugin_customization, \
|
||||||
|
customize_plugin
|
||||||
|
help_text = self.customization_help(gui=True)
|
||||||
|
help_text = QLabel(help_text, config_dialog)
|
||||||
|
help_text.setWordWrap(True)
|
||||||
|
help_text.setTextInteractionFlags(Qt.LinksAccessibleByMouse | Qt.LinksAccessibleByKeyboard)
|
||||||
|
help_text.setOpenExternalLinks(True)
|
||||||
|
v.addWidget(help_text)
|
||||||
|
sc = plugin_customization(self)
|
||||||
|
if not sc:
|
||||||
|
sc = ''
|
||||||
|
sc = sc.strip()
|
||||||
|
sc = QLineEdit(sc, config_dialog)
|
||||||
|
v.addWidget(sc)
|
||||||
|
v.addWidget(button_box)
|
||||||
|
size_dialog()
|
||||||
|
config_dialog.exec_()
|
||||||
|
|
||||||
|
if config_dialog.result() == QDialog.Accepted:
|
||||||
|
sc = unicode_type(sc.text()).strip()
|
||||||
|
customize_plugin(self, sc)
|
||||||
|
|
||||||
|
geom = bytearray(config_dialog.saveGeometry())
|
||||||
|
gprefs[prefname] = geom
|
||||||
|
|
||||||
|
return config_dialog.result()
|
||||||
|
|
||||||
|
def load_resources(self, names):
|
||||||
|
'''
|
||||||
|
If this plugin comes in a ZIP file (user added plugin), this method
|
||||||
|
will allow you to load resources from the ZIP file.
|
||||||
|
|
||||||
|
For example to load an image::
|
||||||
|
|
||||||
|
pixmap = QPixmap()
|
||||||
|
pixmap.loadFromData(self.load_resources(['images/icon.png'])['images/icon.png'])
|
||||||
|
icon = QIcon(pixmap)
|
||||||
|
|
||||||
|
:param names: List of paths to resources in the ZIP file using / as separator
|
||||||
|
|
||||||
|
:return: A dictionary of the form ``{name: file_contents}``. Any names
|
||||||
|
that were not found in the ZIP file will not be present in the
|
||||||
|
dictionary.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if self.plugin_path is None:
|
||||||
|
raise ValueError('This plugin was not loaded from a ZIP file')
|
||||||
|
ans = {}
|
||||||
|
with zipfile.ZipFile(self.plugin_path, 'r') as zf:
|
||||||
|
for candidate in zf.namelist():
|
||||||
|
if candidate in names:
|
||||||
|
ans[candidate] = zf.read(candidate)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def customization_help(self, gui=False):
|
||||||
|
'''
|
||||||
|
Return a string giving help on how to customize this plugin.
|
||||||
|
By default raise a :class:`NotImplementedError`, which indicates that
|
||||||
|
the plugin does not require customization.
|
||||||
|
|
||||||
|
If you re-implement this method in your subclass, the user will
|
||||||
|
be asked to enter a string as customization for this plugin.
|
||||||
|
The customization string will be available as
|
||||||
|
``self.site_customization``.
|
||||||
|
|
||||||
|
Site customization could be anything, for example, the path to
|
||||||
|
a needed binary on the user's computer.
|
||||||
|
|
||||||
|
:param gui: If True return HTML help, otherwise return plain text help.
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def temporary_file(self, suffix):
|
||||||
|
'''
|
||||||
|
Return a file-like object that is a temporary file on the file system.
|
||||||
|
This file will remain available even after being closed and will only
|
||||||
|
be removed on interpreter shutdown. Use the ``name`` member of the
|
||||||
|
returned object to access the full path to the created temporary file.
|
||||||
|
|
||||||
|
:param suffix: The suffix that the temporary file will have.
|
||||||
|
'''
|
||||||
|
return PersistentTemporaryFile(suffix)
|
||||||
|
|
||||||
|
def is_customizable(self):
|
||||||
|
try:
|
||||||
|
self.customization_help()
|
||||||
|
return True
|
||||||
|
except NotImplementedError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
def __enter__(self, *args):
|
||||||
|
'''
|
||||||
|
Add this plugin to the python path so that it's contents become directly importable.
|
||||||
|
Useful when bundling large python libraries into the plugin. Use it like this::
|
||||||
|
with plugin:
|
||||||
|
import something
|
||||||
|
'''
|
||||||
|
if self.plugin_path is not None:
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
zf = ZipFile(self.plugin_path)
|
||||||
|
extensions = {x.rpartition('.')[-1].lower() for x in
|
||||||
|
zf.namelist()}
|
||||||
|
zip_safe = True
|
||||||
|
for ext in ('pyd', 'so', 'dll', 'dylib'):
|
||||||
|
if ext in extensions:
|
||||||
|
zip_safe = False
|
||||||
|
break
|
||||||
|
if zip_safe:
|
||||||
|
sys.path.insert(0, self.plugin_path)
|
||||||
|
self.sys_insertion_path = self.plugin_path
|
||||||
|
else:
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
self._sys_insertion_tdir = TemporaryDirectory('plugin_unzip')
|
||||||
|
self.sys_insertion_path = self._sys_insertion_tdir.__enter__(*args)
|
||||||
|
zf.extractall(self.sys_insertion_path)
|
||||||
|
sys.path.insert(0, self.sys_insertion_path)
|
||||||
|
zf.close()
|
||||||
|
|
||||||
|
def __exit__(self, *args):
|
||||||
|
ip, it = getattr(self, 'sys_insertion_path', None), getattr(self,
|
||||||
|
'_sys_insertion_tdir', None)
|
||||||
|
if ip in sys.path:
|
||||||
|
sys.path.remove(ip)
|
||||||
|
if hasattr(it, '__exit__'):
|
||||||
|
it.__exit__(*args)
|
||||||
|
|
||||||
|
def cli_main(self, args):
|
||||||
|
'''
|
||||||
|
This method is the main entry point for your plugins command line
|
||||||
|
interface. It is called when the user does: calibre-debug -r "Plugin
|
||||||
|
Name". Any arguments passed are present in the args variable.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError('The %s plugin has no command line interface'
|
||||||
|
%self.name)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class FileTypePlugin(Plugin): # {{{
|
||||||
|
'''
|
||||||
|
A plugin that is associated with a particular set of file types.
|
||||||
|
'''
|
||||||
|
|
||||||
|
#: Set of file types for which this plugin should be run.
|
||||||
|
#: Use '*' for all file types.
|
||||||
|
#: For example: ``{'lit', 'mobi', 'prc'}``
|
||||||
|
file_types = set()
|
||||||
|
|
||||||
|
#: If True, this plugin is run when books are added
|
||||||
|
#: to the database
|
||||||
|
on_import = False
|
||||||
|
|
||||||
|
#: If True, this plugin is run after books are added
|
||||||
|
#: to the database. In this case the postimport and postadd
|
||||||
|
#: methods of the plugin are called.
|
||||||
|
on_postimport = False
|
||||||
|
|
||||||
|
#: If True, this plugin is run just before a conversion
|
||||||
|
on_preprocess = False
|
||||||
|
|
||||||
|
#: If True, this plugin is run after conversion
|
||||||
|
#: on the final file produced by the conversion output plugin.
|
||||||
|
on_postprocess = False
|
||||||
|
|
||||||
|
type = _('File type')
|
||||||
|
|
||||||
|
def run(self, path_to_ebook):
|
||||||
|
'''
|
||||||
|
Run the plugin. Must be implemented in subclasses.
|
||||||
|
It should perform whatever modifications are required
|
||||||
|
on the e-book and return the absolute path to the
|
||||||
|
modified e-book. If no modifications are needed, it should
|
||||||
|
return the path to the original e-book. If an error is encountered
|
||||||
|
it should raise an Exception. The default implementation
|
||||||
|
simply return the path to the original e-book. Note that the path to
|
||||||
|
the original file (before any file type plugins are run, is available as
|
||||||
|
self.original_path_to_file).
|
||||||
|
|
||||||
|
The modified e-book file should be created with the
|
||||||
|
:meth:`temporary_file` method.
|
||||||
|
|
||||||
|
:param path_to_ebook: Absolute path to the e-book.
|
||||||
|
|
||||||
|
:return: Absolute path to the modified e-book.
|
||||||
|
'''
|
||||||
|
# Default implementation does nothing
|
||||||
|
return path_to_ebook
|
||||||
|
|
||||||
|
def postimport(self, book_id, book_format, db):
|
||||||
|
'''
|
||||||
|
Called post import, i.e., after the book file has been added to the database. Note that
|
||||||
|
this is different from :meth:`postadd` which is called when the book record is created for
|
||||||
|
the first time. This method is called whenever a new file is added to a book record. It is
|
||||||
|
useful for modifying the book record based on the contents of the newly added file.
|
||||||
|
|
||||||
|
:param book_id: Database id of the added book.
|
||||||
|
:param book_format: The file type of the book that was added.
|
||||||
|
:param db: Library database.
|
||||||
|
'''
|
||||||
|
pass # Default implementation does nothing
|
||||||
|
|
||||||
|
def postadd(self, book_id, fmt_map, db):
|
||||||
|
'''
|
||||||
|
Called post add, i.e. after a book has been added to the db. Note that
|
||||||
|
this is different from :meth:`postimport`, which is called after a single book file
|
||||||
|
has been added to a book. postadd() is called only when an entire book record
|
||||||
|
with possibly more than one book file has been created for the first time.
|
||||||
|
This is useful if you wish to modify the book record in the database when the
|
||||||
|
book is first added to calibre.
|
||||||
|
|
||||||
|
:param book_id: Database id of the added book.
|
||||||
|
:param fmt_map: Map of file format to path from which the file format
|
||||||
|
was added. Note that this might or might not point to an actual
|
||||||
|
existing file, as sometimes files are added as streams. In which case
|
||||||
|
it might be a dummy value or a non-existent path.
|
||||||
|
:param db: Library database
|
||||||
|
'''
|
||||||
|
pass # Default implementation does nothing
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class MetadataReaderPlugin(Plugin): # {{{
|
||||||
|
'''
|
||||||
|
A plugin that implements reading metadata from a set of file types.
|
||||||
|
'''
|
||||||
|
#: Set of file types for which this plugin should be run.
|
||||||
|
#: For example: ``set(['lit', 'mobi', 'prc'])``
|
||||||
|
file_types = set()
|
||||||
|
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
version = numeric_version
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
|
||||||
|
type = _('Metadata reader')
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
Plugin.__init__(self, *args, **kwargs)
|
||||||
|
self.quick = False
|
||||||
|
|
||||||
|
def get_metadata(self, stream, type):
|
||||||
|
'''
|
||||||
|
Return metadata for the file represented by stream (a file like object
|
||||||
|
that supports reading). Raise an exception when there is an error
|
||||||
|
with the input data.
|
||||||
|
|
||||||
|
:param type: The type of file. Guaranteed to be one of the entries
|
||||||
|
in :attr:`file_types`.
|
||||||
|
:return: A :class:`calibre.ebooks.metadata.book.Metadata` object
|
||||||
|
'''
|
||||||
|
return None
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class MetadataWriterPlugin(Plugin): # {{{
|
||||||
|
'''
|
||||||
|
A plugin that implements reading metadata from a set of file types.
|
||||||
|
'''
|
||||||
|
#: Set of file types for which this plugin should be run.
|
||||||
|
#: For example: ``set(['lit', 'mobi', 'prc'])``
|
||||||
|
file_types = set()
|
||||||
|
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
version = numeric_version
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
|
||||||
|
type = _('Metadata writer')
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
Plugin.__init__(self, *args, **kwargs)
|
||||||
|
self.apply_null = False
|
||||||
|
|
||||||
|
def set_metadata(self, stream, mi, type):
|
||||||
|
'''
|
||||||
|
Set metadata for the file represented by stream (a file like object
|
||||||
|
that supports reading). Raise an exception when there is an error
|
||||||
|
with the input data.
|
||||||
|
|
||||||
|
:param type: The type of file. Guaranteed to be one of the entries
|
||||||
|
in :attr:`file_types`.
|
||||||
|
:param mi: A :class:`calibre.ebooks.metadata.book.Metadata` object
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class CatalogPlugin(Plugin): # {{{
|
||||||
|
'''
|
||||||
|
A plugin that implements a catalog generator.
|
||||||
|
'''
|
||||||
|
|
||||||
|
resources_path = None
|
||||||
|
|
||||||
|
#: Output file type for which this plugin should be run.
|
||||||
|
#: For example: 'epub' or 'xml'
|
||||||
|
file_types = set()
|
||||||
|
|
||||||
|
type = _('Catalog generator')
|
||||||
|
|
||||||
|
#: CLI parser options specific to this plugin, declared as namedtuple Option:
|
||||||
|
#:
|
||||||
|
#: from collections import namedtuple
|
||||||
|
#: Option = namedtuple('Option', 'option, default, dest, help')
|
||||||
|
#: cli_options = [Option('--catalog-title', default = 'My Catalog',
|
||||||
|
#: dest = 'catalog_title', help = (_('Title of generated catalog. \nDefault:') + " '" + '%default' + "'"))]
|
||||||
|
#: cli_options parsed in calibre.db.cli.cmd_catalog:option_parser()
|
||||||
|
#:
|
||||||
|
cli_options = []
|
||||||
|
|
||||||
|
def _field_sorter(self, key):
|
||||||
|
'''
|
||||||
|
Custom fields sort after standard fields
|
||||||
|
'''
|
||||||
|
if key.startswith('#'):
|
||||||
|
return '~%s' % key[1:]
|
||||||
|
else:
|
||||||
|
return key
|
||||||
|
|
||||||
|
def search_sort_db(self, db, opts):
|
||||||
|
|
||||||
|
db.search(opts.search_text)
|
||||||
|
|
||||||
|
if opts.sort_by:
|
||||||
|
# 2nd arg = ascending
|
||||||
|
db.sort(opts.sort_by, True)
|
||||||
|
return db.get_data_as_dict(ids=opts.ids)
|
||||||
|
|
||||||
|
def get_output_fields(self, db, opts):
|
||||||
|
# Return a list of requested fields
|
||||||
|
all_std_fields = {'author_sort','authors','comments','cover','formats',
|
||||||
|
'id','isbn','library_name','ondevice','pubdate','publisher',
|
||||||
|
'rating','series_index','series','size','tags','timestamp',
|
||||||
|
'title_sort','title','uuid','languages','identifiers'}
|
||||||
|
all_custom_fields = set(db.custom_field_keys())
|
||||||
|
for field in list(all_custom_fields):
|
||||||
|
fm = db.field_metadata[field]
|
||||||
|
if fm['datatype'] == 'series':
|
||||||
|
all_custom_fields.add(field+'_index')
|
||||||
|
all_fields = all_std_fields.union(all_custom_fields)
|
||||||
|
|
||||||
|
if opts.fields != 'all':
|
||||||
|
# Make a list from opts.fields
|
||||||
|
of = [x.strip() for x in opts.fields.split(',')]
|
||||||
|
requested_fields = set(of)
|
||||||
|
|
||||||
|
# Validate requested_fields
|
||||||
|
if requested_fields - all_fields:
|
||||||
|
from calibre.library import current_library_name
|
||||||
|
invalid_fields = sorted(list(requested_fields - all_fields))
|
||||||
|
print("invalid --fields specified: %s" % ', '.join(invalid_fields))
|
||||||
|
print("available fields in '%s': %s" %
|
||||||
|
(current_library_name(), ', '.join(sorted(list(all_fields)))))
|
||||||
|
raise ValueError("unable to generate catalog with specified fields")
|
||||||
|
|
||||||
|
fields = [x for x in of if x in all_fields]
|
||||||
|
else:
|
||||||
|
fields = sorted(all_fields, key=self._field_sorter)
|
||||||
|
|
||||||
|
if not opts.connected_device['is_device_connected'] and 'ondevice' in fields:
|
||||||
|
fields.pop(int(fields.index('ondevice')))
|
||||||
|
|
||||||
|
return fields
|
||||||
|
|
||||||
|
def initialize(self):
|
||||||
|
'''
|
||||||
|
If plugin is not a built-in, copy the plugin's .ui and .py files from
|
||||||
|
the ZIP file to $TMPDIR.
|
||||||
|
Tab will be dynamically generated and added to the Catalog Options dialog in
|
||||||
|
calibre.gui2.dialogs.catalog.py:Catalog
|
||||||
|
'''
|
||||||
|
from calibre.customize.builtins import plugins as builtin_plugins
|
||||||
|
from calibre.customize.ui import config
|
||||||
|
from calibre.ptempfile import PersistentTemporaryDirectory
|
||||||
|
|
||||||
|
if not type(self) in builtin_plugins and self.name not in config['disabled_plugins']:
|
||||||
|
files_to_copy = ["%s.%s" % (self.name.lower(),ext) for ext in ["ui","py"]]
|
||||||
|
resources = zipfile.ZipFile(self.plugin_path,'r')
|
||||||
|
|
||||||
|
if self.resources_path is None:
|
||||||
|
self.resources_path = PersistentTemporaryDirectory('_plugin_resources', prefix='')
|
||||||
|
|
||||||
|
for file in files_to_copy:
|
||||||
|
try:
|
||||||
|
resources.extract(file, self.resources_path)
|
||||||
|
except:
|
||||||
|
print(" customize:__init__.initialize(): %s not found in %s" % (file, os.path.basename(self.plugin_path)))
|
||||||
|
continue
|
||||||
|
resources.close()
|
||||||
|
|
||||||
|
def run(self, path_to_output, opts, db, ids, notification=None):
|
||||||
|
'''
|
||||||
|
Run the plugin. Must be implemented in subclasses.
|
||||||
|
It should generate the catalog in the format specified
|
||||||
|
in file_types, returning the absolute path to the
|
||||||
|
generated catalog file. If an error is encountered
|
||||||
|
it should raise an Exception.
|
||||||
|
|
||||||
|
The generated catalog file should be created with the
|
||||||
|
:meth:`temporary_file` method.
|
||||||
|
|
||||||
|
:param path_to_output: Absolute path to the generated catalog file.
|
||||||
|
:param opts: A dictionary of keyword arguments
|
||||||
|
:param db: A LibraryDatabase2 object
|
||||||
|
'''
|
||||||
|
# Default implementation does nothing
|
||||||
|
raise NotImplementedError('CatalogPlugin.generate_catalog() default '
|
||||||
|
'method, should be overridden in subclass')
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class InterfaceActionBase(Plugin): # {{{
|
||||||
|
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
type = _('User interface action')
|
||||||
|
can_be_disabled = False
|
||||||
|
|
||||||
|
actual_plugin = None
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
Plugin.__init__(self, *args, **kwargs)
|
||||||
|
self.actual_plugin_ = None
|
||||||
|
|
||||||
|
def load_actual_plugin(self, gui):
|
||||||
|
'''
|
||||||
|
This method must return the actual interface action plugin object.
|
||||||
|
'''
|
||||||
|
ac = self.actual_plugin_
|
||||||
|
if ac is None:
|
||||||
|
mod, cls = self.actual_plugin.split(':')
|
||||||
|
ac = getattr(importlib.import_module(mod), cls)(gui,
|
||||||
|
self.site_customization)
|
||||||
|
self.actual_plugin_ = ac
|
||||||
|
return ac
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class PreferencesPlugin(Plugin): # {{{
|
||||||
|
|
||||||
|
'''
|
||||||
|
A plugin representing a widget displayed in the Preferences dialog.
|
||||||
|
|
||||||
|
This plugin has only one important method :meth:`create_widget`. The
|
||||||
|
various fields of the plugin control how it is categorized in the UI.
|
||||||
|
'''
|
||||||
|
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
type = _('Preferences')
|
||||||
|
can_be_disabled = False
|
||||||
|
|
||||||
|
#: Import path to module that contains a class named ConfigWidget
|
||||||
|
#: which implements the ConfigWidgetInterface. Used by
|
||||||
|
#: :meth:`create_widget`.
|
||||||
|
config_widget = None
|
||||||
|
|
||||||
|
#: Where in the list of categories the :attr:`category` of this plugin should be.
|
||||||
|
category_order = 100
|
||||||
|
|
||||||
|
#: Where in the list of names in a category, the :attr:`gui_name` of this
|
||||||
|
#: plugin should be
|
||||||
|
name_order = 100
|
||||||
|
|
||||||
|
#: The category this plugin should be in
|
||||||
|
category = None
|
||||||
|
|
||||||
|
#: The category name displayed to the user for this plugin
|
||||||
|
gui_category = None
|
||||||
|
|
||||||
|
#: The name displayed to the user for this plugin
|
||||||
|
gui_name = None
|
||||||
|
|
||||||
|
#: The icon for this plugin, should be an absolute path
|
||||||
|
icon = None
|
||||||
|
|
||||||
|
#: The description used for tooltips and the like
|
||||||
|
description = None
|
||||||
|
|
||||||
|
def create_widget(self, parent=None):
|
||||||
|
'''
|
||||||
|
Create and return the actual Qt widget used for setting this group of
|
||||||
|
preferences. The widget must implement the
|
||||||
|
:class:`calibre.gui2.preferences.ConfigWidgetInterface`.
|
||||||
|
|
||||||
|
The default implementation uses :attr:`config_widget` to instantiate
|
||||||
|
the widget.
|
||||||
|
'''
|
||||||
|
base, _, wc = self.config_widget.partition(':')
|
||||||
|
if not wc:
|
||||||
|
wc = 'ConfigWidget'
|
||||||
|
base = importlib.import_module(base)
|
||||||
|
widget = getattr(base, wc)
|
||||||
|
return widget(parent)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class StoreBase(Plugin): # {{{
|
||||||
|
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
author = 'John Schember'
|
||||||
|
type = _('Store')
|
||||||
|
# Information about the store. Should be in the primary language
|
||||||
|
# of the store. This should not be translatable when set by
|
||||||
|
# a subclass.
|
||||||
|
description = _('An e-book store.')
|
||||||
|
minimum_calibre_version = (0, 8, 0)
|
||||||
|
version = (1, 0, 1)
|
||||||
|
|
||||||
|
actual_plugin = None
|
||||||
|
|
||||||
|
# Does the store only distribute e-books without DRM.
|
||||||
|
drm_free_only = False
|
||||||
|
# This is the 2 letter country code for the corporate
|
||||||
|
# headquarters of the store.
|
||||||
|
headquarters = ''
|
||||||
|
# All formats the store distributes e-books in.
|
||||||
|
formats = []
|
||||||
|
# Is this store on an affiliate program?
|
||||||
|
affiliate = False
|
||||||
|
|
||||||
|
def load_actual_plugin(self, gui):
|
||||||
|
'''
|
||||||
|
This method must return the actual interface action plugin object.
|
||||||
|
'''
|
||||||
|
mod, cls = self.actual_plugin.split(':')
|
||||||
|
self.actual_plugin_object = getattr(importlib.import_module(mod), cls)(gui, self.name)
|
||||||
|
return self.actual_plugin_object
|
||||||
|
|
||||||
|
def customization_help(self, gui=False):
|
||||||
|
if getattr(self, 'actual_plugin_object', None) is not None:
|
||||||
|
return self.actual_plugin_object.customization_help(gui)
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def config_widget(self):
|
||||||
|
if getattr(self, 'actual_plugin_object', None) is not None:
|
||||||
|
return self.actual_plugin_object.config_widget()
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def save_settings(self, config_widget):
|
||||||
|
if getattr(self, 'actual_plugin_object', None) is not None:
|
||||||
|
return self.actual_plugin_object.save_settings(config_widget)
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class EditBookToolPlugin(Plugin): # {{{
|
||||||
|
|
||||||
|
type = _('Edit book tool')
|
||||||
|
minimum_calibre_version = (1, 46, 0)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class LibraryClosedPlugin(Plugin): # {{{
|
||||||
|
'''
|
||||||
|
LibraryClosedPlugins are run when a library is closed, either at shutdown,
|
||||||
|
when the library is changed, or when a library is used in some other way.
|
||||||
|
At the moment these plugins won't be called by the CLI functions.
|
||||||
|
'''
|
||||||
|
type = _('Library closed')
|
||||||
|
|
||||||
|
# minimum version 2.54 because that is when support was added
|
||||||
|
minimum_calibre_version = (2, 54, 0)
|
||||||
|
|
||||||
|
def run(self, db):
|
||||||
|
'''
|
||||||
|
The db will be a reference to the new_api (db.cache.py).
|
||||||
|
|
||||||
|
The plugin must run to completion. It must not use the GUI, threads, or
|
||||||
|
any signals.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError('LibraryClosedPlugin '
|
||||||
|
'run method must be overridden in subclass')
|
||||||
|
# }}}
|
||||||
1973
ebook_converter/customize/builtins.py
Normal file
1973
ebook_converter/customize/builtins.py
Normal file
File diff suppressed because it is too large
Load Diff
376
ebook_converter/customize/conversion.py
Normal file
376
ebook_converter/customize/conversion.py
Normal file
@@ -0,0 +1,376 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
'''
|
||||||
|
Defines the plugin system for conversions.
|
||||||
|
'''
|
||||||
|
import re, os, shutil, numbers
|
||||||
|
|
||||||
|
from calibre import CurrentDir
|
||||||
|
from calibre.customize import Plugin
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class ConversionOption(object):
|
||||||
|
|
||||||
|
'''
|
||||||
|
Class representing conversion options
|
||||||
|
'''
|
||||||
|
|
||||||
|
def __init__(self, name=None, help=None, long_switch=None,
|
||||||
|
short_switch=None, choices=None):
|
||||||
|
self.name = name
|
||||||
|
self.help = help
|
||||||
|
self.long_switch = long_switch
|
||||||
|
self.short_switch = short_switch
|
||||||
|
self.choices = choices
|
||||||
|
|
||||||
|
if self.long_switch is None:
|
||||||
|
self.long_switch = self.name.replace('_', '-')
|
||||||
|
|
||||||
|
self.validate_parameters()
|
||||||
|
|
||||||
|
def validate_parameters(self):
|
||||||
|
'''
|
||||||
|
Validate the parameters passed to :meth:`__init__`.
|
||||||
|
'''
|
||||||
|
if re.match(r'[a-zA-Z_]([a-zA-Z0-9_])*', self.name) is None:
|
||||||
|
raise ValueError(self.name + ' is not a valid Python identifier')
|
||||||
|
if not self.help:
|
||||||
|
raise ValueError('You must set the help text')
|
||||||
|
|
||||||
|
def __hash__(self):
|
||||||
|
return hash(self.name)
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
return self.name == getattr(other, 'name', other)
|
||||||
|
|
||||||
|
def clone(self):
|
||||||
|
return ConversionOption(name=self.name, help=self.help,
|
||||||
|
long_switch=self.long_switch, short_switch=self.short_switch,
|
||||||
|
choices=self.choices)
|
||||||
|
|
||||||
|
|
||||||
|
class OptionRecommendation(object):
|
||||||
|
LOW = 1
|
||||||
|
MED = 2
|
||||||
|
HIGH = 3
|
||||||
|
|
||||||
|
def __init__(self, recommended_value=None, level=LOW, **kwargs):
|
||||||
|
'''
|
||||||
|
An option recommendation. That is, an option as well as its recommended
|
||||||
|
value and the level of the recommendation.
|
||||||
|
'''
|
||||||
|
self.level = level
|
||||||
|
self.recommended_value = recommended_value
|
||||||
|
self.option = kwargs.pop('option', None)
|
||||||
|
if self.option is None:
|
||||||
|
self.option = ConversionOption(**kwargs)
|
||||||
|
|
||||||
|
self.validate_parameters()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def help(self):
|
||||||
|
return self.option.help
|
||||||
|
|
||||||
|
def clone(self):
|
||||||
|
return OptionRecommendation(recommended_value=self.recommended_value,
|
||||||
|
level=self.level, option=self.option.clone())
|
||||||
|
|
||||||
|
def validate_parameters(self):
|
||||||
|
if self.option.choices and self.recommended_value not in \
|
||||||
|
self.option.choices:
|
||||||
|
raise ValueError('OpRec: %s: Recommended value not in choices'%
|
||||||
|
self.option.name)
|
||||||
|
if not (isinstance(self.recommended_value, (numbers.Number, bytes, unicode_type)) or self.recommended_value is None):
|
||||||
|
raise ValueError('OpRec: %s:'%self.option.name + repr(
|
||||||
|
self.recommended_value) + ' is not a string or a number')
|
||||||
|
|
||||||
|
|
||||||
|
class DummyReporter(object):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.cancel_requested = False
|
||||||
|
|
||||||
|
def __call__(self, percent, msg=''):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def gui_configuration_widget(name, parent, get_option_by_name,
|
||||||
|
get_option_help, db, book_id, for_output=True):
|
||||||
|
import importlib
|
||||||
|
|
||||||
|
def widget_factory(cls):
|
||||||
|
return cls(parent, get_option_by_name,
|
||||||
|
get_option_help, db, book_id)
|
||||||
|
|
||||||
|
if for_output:
|
||||||
|
try:
|
||||||
|
output_widget = importlib.import_module(
|
||||||
|
'calibre.gui2.convert.'+name)
|
||||||
|
pw = output_widget.PluginWidget
|
||||||
|
pw.ICON = I('back.png')
|
||||||
|
pw.HELP = _('Options specific to the output format.')
|
||||||
|
return widget_factory(pw)
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
input_widget = importlib.import_module(
|
||||||
|
'calibre.gui2.convert.'+name)
|
||||||
|
pw = input_widget.PluginWidget
|
||||||
|
pw.ICON = I('forward.png')
|
||||||
|
pw.HELP = _('Options specific to the input format.')
|
||||||
|
return widget_factory(pw)
|
||||||
|
except ImportError:
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class InputFormatPlugin(Plugin):
|
||||||
|
|
||||||
|
'''
|
||||||
|
InputFormatPlugins are responsible for converting a document into
|
||||||
|
HTML+OPF+CSS+etc.
|
||||||
|
The results of the conversion *must* be encoded in UTF-8.
|
||||||
|
The main action happens in :meth:`convert`.
|
||||||
|
'''
|
||||||
|
|
||||||
|
type = _('Conversion input')
|
||||||
|
can_be_disabled = False
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
commit_name = None # unique name under which options for this plugin are saved
|
||||||
|
ui_data = None
|
||||||
|
|
||||||
|
#: Set of file types for which this plugin should be run
|
||||||
|
#: For example: ``set(['azw', 'mobi', 'prc'])``
|
||||||
|
file_types = set()
|
||||||
|
|
||||||
|
#: If True, this input plugin generates a collection of images,
|
||||||
|
#: one per HTML file. This can be set dynamically, in the convert method
|
||||||
|
#: if the input files can be both image collections and non-image collections.
|
||||||
|
#: If you set this to True, you must implement the get_images() method that returns
|
||||||
|
#: a list of images.
|
||||||
|
is_image_collection = False
|
||||||
|
|
||||||
|
#: Number of CPU cores used by this plugin.
|
||||||
|
#: A value of -1 means that it uses all available cores
|
||||||
|
core_usage = 1
|
||||||
|
|
||||||
|
#: If set to True, the input plugin will perform special processing
|
||||||
|
#: to make its output suitable for viewing
|
||||||
|
for_viewer = False
|
||||||
|
|
||||||
|
#: The encoding that this input plugin creates files in. A value of
|
||||||
|
#: None means that the encoding is undefined and must be
|
||||||
|
#: detected individually
|
||||||
|
output_encoding = 'utf-8'
|
||||||
|
|
||||||
|
#: Options shared by all Input format plugins. Do not override
|
||||||
|
#: in sub-classes. Use :attr:`options` instead. Every option must be an
|
||||||
|
#: instance of :class:`OptionRecommendation`.
|
||||||
|
common_options = {
|
||||||
|
OptionRecommendation(name='input_encoding',
|
||||||
|
recommended_value=None, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify the character encoding of the input document. If '
|
||||||
|
'set this option will override any encoding declared by the '
|
||||||
|
'document itself. Particularly useful for documents that '
|
||||||
|
'do not declare an encoding or that have erroneous '
|
||||||
|
'encoding declarations.')
|
||||||
|
)}
|
||||||
|
|
||||||
|
#: Options to customize the behavior of this plugin. Every option must be an
|
||||||
|
#: instance of :class:`OptionRecommendation`.
|
||||||
|
options = set()
|
||||||
|
|
||||||
|
#: A set of 3-tuples of the form
|
||||||
|
#: (option_name, recommended_value, recommendation_level)
|
||||||
|
recommendations = set()
|
||||||
|
|
||||||
|
def __init__(self, *args):
|
||||||
|
Plugin.__init__(self, *args)
|
||||||
|
self.report_progress = DummyReporter()
|
||||||
|
|
||||||
|
def get_images(self):
|
||||||
|
'''
|
||||||
|
Return a list of absolute paths to the images, if this input plugin
|
||||||
|
represents an image collection. The list of images is in the same order
|
||||||
|
as the spine and the TOC.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log, accelerators):
|
||||||
|
'''
|
||||||
|
This method must be implemented in sub-classes. It must return
|
||||||
|
the path to the created OPF file or an :class:`OEBBook` instance.
|
||||||
|
All output should be contained in the current directory.
|
||||||
|
If this plugin creates files outside the current
|
||||||
|
directory they must be deleted/marked for deletion before this method
|
||||||
|
returns.
|
||||||
|
|
||||||
|
:param stream: A file like object that contains the input file.
|
||||||
|
:param options: Options to customize the conversion process.
|
||||||
|
Guaranteed to have attributes corresponding
|
||||||
|
to all the options declared by this plugin. In
|
||||||
|
addition, it will have a verbose attribute that
|
||||||
|
takes integral values from zero upwards. Higher numbers
|
||||||
|
mean be more verbose. Another useful attribute is
|
||||||
|
``input_profile`` that is an instance of
|
||||||
|
:class:`calibre.customize.profiles.InputProfile`.
|
||||||
|
:param file_ext: The extension (without the .) of the input file. It
|
||||||
|
is guaranteed to be one of the `file_types` supported
|
||||||
|
by this plugin.
|
||||||
|
:param log: A :class:`calibre.utils.logging.Log` object. All output
|
||||||
|
should use this object.
|
||||||
|
:param accelarators: A dictionary of various information that the input
|
||||||
|
plugin can get easily that would speed up the
|
||||||
|
subsequent stages of the conversion.
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def __call__(self, stream, options, file_ext, log,
|
||||||
|
accelerators, output_dir):
|
||||||
|
try:
|
||||||
|
log('InputFormatPlugin: %s running'%self.name)
|
||||||
|
if hasattr(stream, 'name'):
|
||||||
|
log('on', stream.name)
|
||||||
|
except:
|
||||||
|
# In case stdout is broken
|
||||||
|
pass
|
||||||
|
|
||||||
|
with CurrentDir(output_dir):
|
||||||
|
for x in os.listdir('.'):
|
||||||
|
shutil.rmtree(x) if os.path.isdir(x) else os.remove(x)
|
||||||
|
|
||||||
|
ret = self.convert(stream, options, file_ext,
|
||||||
|
log, accelerators)
|
||||||
|
|
||||||
|
return ret
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
'''
|
||||||
|
Called to allow the input plugin to perform postprocessing after
|
||||||
|
the book has been parsed.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def specialize(self, oeb, opts, log, output_fmt):
|
||||||
|
'''
|
||||||
|
Called to allow the input plugin to specialize the parsed book
|
||||||
|
for a particular output format. Called after postprocess_book
|
||||||
|
and before any transforms are performed on the parsed book.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def gui_configuration_widget(self, parent, get_option_by_name,
|
||||||
|
get_option_help, db, book_id=None):
|
||||||
|
'''
|
||||||
|
Called to create the widget used for configuring this plugin in the
|
||||||
|
calibre GUI. The widget must be an instance of the PluginWidget class.
|
||||||
|
See the builtin input plugins for examples.
|
||||||
|
'''
|
||||||
|
name = self.name.lower().replace(' ', '_')
|
||||||
|
return gui_configuration_widget(name, parent, get_option_by_name,
|
||||||
|
get_option_help, db, book_id, for_output=False)
|
||||||
|
|
||||||
|
|
||||||
|
class OutputFormatPlugin(Plugin):
|
||||||
|
|
||||||
|
'''
|
||||||
|
OutputFormatPlugins are responsible for converting an OEB document
|
||||||
|
(OPF+HTML) into an output e-book.
|
||||||
|
|
||||||
|
The OEB document can be assumed to be encoded in UTF-8.
|
||||||
|
The main action happens in :meth:`convert`.
|
||||||
|
'''
|
||||||
|
|
||||||
|
type = _('Conversion output')
|
||||||
|
can_be_disabled = False
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
commit_name = None # unique name under which options for this plugin are saved
|
||||||
|
ui_data = None
|
||||||
|
|
||||||
|
#: The file type (extension without leading period) that this
|
||||||
|
#: plugin outputs
|
||||||
|
file_type = None
|
||||||
|
|
||||||
|
#: Options shared by all Input format plugins. Do not override
|
||||||
|
#: in sub-classes. Use :attr:`options` instead. Every option must be an
|
||||||
|
#: instance of :class:`OptionRecommendation`.
|
||||||
|
common_options = {
|
||||||
|
OptionRecommendation(name='pretty_print',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('If specified, the output plugin will try to create output '
|
||||||
|
'that is as human readable as possible. May not have any effect '
|
||||||
|
'for some output plugins.')
|
||||||
|
)}
|
||||||
|
|
||||||
|
#: Options to customize the behavior of this plugin. Every option must be an
|
||||||
|
#: instance of :class:`OptionRecommendation`.
|
||||||
|
options = set()
|
||||||
|
|
||||||
|
#: A set of 3-tuples of the form
|
||||||
|
#: (option_name, recommended_value, recommendation_level)
|
||||||
|
recommendations = set()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def description(self):
|
||||||
|
return _('Convert e-books to the %s format')%self.file_type
|
||||||
|
|
||||||
|
def __init__(self, *args):
|
||||||
|
Plugin.__init__(self, *args)
|
||||||
|
self.report_progress = DummyReporter()
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output, input_plugin, opts, log):
|
||||||
|
'''
|
||||||
|
Render the contents of `oeb_book` (which is an instance of
|
||||||
|
:class:`calibre.ebooks.oeb.OEBBook`) to the file specified by output.
|
||||||
|
|
||||||
|
:param output: Either a file like object or a string. If it is a string
|
||||||
|
it is the path to a directory that may or may not exist. The output
|
||||||
|
plugin should write its output into that directory. If it is a file like
|
||||||
|
object, the output plugin should write its output into the file.
|
||||||
|
:param input_plugin: The input plugin that was used at the beginning of
|
||||||
|
the conversion pipeline.
|
||||||
|
:param opts: Conversion options. Guaranteed to have attributes
|
||||||
|
corresponding to the OptionRecommendations of this plugin.
|
||||||
|
:param log: The logger. Print debug/info messages etc. using this.
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def is_periodical(self):
|
||||||
|
return self.oeb.metadata.publication_type and \
|
||||||
|
unicode_type(self.oeb.metadata.publication_type[0]).startswith('periodical:')
|
||||||
|
|
||||||
|
def specialize_options(self, log, opts, input_fmt):
|
||||||
|
'''
|
||||||
|
Can be used to change the values of conversion options, as used by the
|
||||||
|
conversion pipeline.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def specialize_css_for_output(self, log, opts, item, stylizer):
|
||||||
|
'''
|
||||||
|
Can be used to make changes to the css during the CSS flattening
|
||||||
|
process.
|
||||||
|
|
||||||
|
:param item: The item (HTML file) being processed
|
||||||
|
:param stylizer: A Stylizer object containing the flattened styles for
|
||||||
|
item. You can get the style for any element by
|
||||||
|
stylizer.style(element).
|
||||||
|
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def gui_configuration_widget(self, parent, get_option_by_name,
|
||||||
|
get_option_help, db, book_id=None):
|
||||||
|
'''
|
||||||
|
Called to create the widget used for configuring this plugin in the
|
||||||
|
calibre GUI. The widget must be an instance of the PluginWidget class.
|
||||||
|
See the builtin output plugins for examples.
|
||||||
|
'''
|
||||||
|
name = self.name.lower().replace(' ', '_')
|
||||||
|
return gui_configuration_widget(name, parent, get_option_by_name,
|
||||||
|
get_option_help, db, book_id, for_output=True)
|
||||||
873
ebook_converter/customize/profiles.py
Normal file
873
ebook_converter/customize/profiles.py
Normal file
@@ -0,0 +1,873 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from calibre.customize import Plugin as _Plugin
|
||||||
|
from polyglot.builtins import zip
|
||||||
|
|
||||||
|
FONT_SIZES = [('xx-small', 1),
|
||||||
|
('x-small', None),
|
||||||
|
('small', 2),
|
||||||
|
('medium', 3),
|
||||||
|
('large', 4),
|
||||||
|
('x-large', 5),
|
||||||
|
('xx-large', 6),
|
||||||
|
(None, 7)]
|
||||||
|
|
||||||
|
|
||||||
|
class Plugin(_Plugin):
|
||||||
|
|
||||||
|
fbase = 12
|
||||||
|
fsizes = [5, 7, 9, 12, 13.5, 17, 20, 22, 24]
|
||||||
|
screen_size = (1600, 1200)
|
||||||
|
dpi = 100
|
||||||
|
|
||||||
|
def __init__(self, *args, **kwargs):
|
||||||
|
_Plugin.__init__(self, *args, **kwargs)
|
||||||
|
self.width, self.height = self.screen_size
|
||||||
|
fsizes = list(self.fsizes)
|
||||||
|
self.fkey = list(self.fsizes)
|
||||||
|
self.fsizes = []
|
||||||
|
for (name, num), size in zip(FONT_SIZES, fsizes):
|
||||||
|
self.fsizes.append((name, num, float(size)))
|
||||||
|
self.fnames = dict((name, sz) for name, _, sz in self.fsizes if name)
|
||||||
|
self.fnums = dict((num, sz) for _, num, sz in self.fsizes if num)
|
||||||
|
self.width_pts = self.width * 72./self.dpi
|
||||||
|
self.height_pts = self.height * 72./self.dpi
|
||||||
|
|
||||||
|
# Input profiles {{{
|
||||||
|
|
||||||
|
|
||||||
|
class InputProfile(Plugin):
|
||||||
|
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
supported_platforms = {'windows', 'osx', 'linux'}
|
||||||
|
can_be_disabled = False
|
||||||
|
type = _('Input profile')
|
||||||
|
|
||||||
|
name = 'Default Input Profile'
|
||||||
|
short_name = 'default' # Used in the CLI so dont use spaces etc. in it
|
||||||
|
description = _('This profile tries to provide sane defaults and is useful '
|
||||||
|
'if you know nothing about the input document.')
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReaderInput(InputProfile):
|
||||||
|
|
||||||
|
name = 'Sony Reader'
|
||||||
|
short_name = 'sony'
|
||||||
|
description = _('This profile is intended for the SONY PRS line. '
|
||||||
|
'The 500/505/600/700 etc.')
|
||||||
|
|
||||||
|
screen_size = (584, 754)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 12
|
||||||
|
fsizes = [7.5, 9, 10, 12, 15.5, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReader300Input(SonyReaderInput):
|
||||||
|
|
||||||
|
name = 'Sony Reader 300'
|
||||||
|
short_name = 'sony300'
|
||||||
|
description = _('This profile is intended for the SONY PRS 300.')
|
||||||
|
|
||||||
|
dpi = 200
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReader900Input(SonyReaderInput):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Sony Reader 900'
|
||||||
|
short_name = 'sony900'
|
||||||
|
description = _('This profile is intended for the SONY PRS-900.')
|
||||||
|
|
||||||
|
screen_size = (584, 978)
|
||||||
|
|
||||||
|
|
||||||
|
class MSReaderInput(InputProfile):
|
||||||
|
|
||||||
|
name = 'Microsoft Reader'
|
||||||
|
short_name = 'msreader'
|
||||||
|
description = _('This profile is intended for the Microsoft Reader.')
|
||||||
|
|
||||||
|
screen_size = (480, 652)
|
||||||
|
dpi = 96
|
||||||
|
fbase = 13
|
||||||
|
fsizes = [10, 11, 13, 16, 18, 20, 22, 26]
|
||||||
|
|
||||||
|
|
||||||
|
class MobipocketInput(InputProfile):
|
||||||
|
|
||||||
|
name = 'Mobipocket Books'
|
||||||
|
short_name = 'mobipocket'
|
||||||
|
description = _('This profile is intended for the Mobipocket books.')
|
||||||
|
|
||||||
|
# Unfortunately MOBI books are not narrowly targeted, so this information is
|
||||||
|
# quite likely to be spurious
|
||||||
|
screen_size = (600, 800)
|
||||||
|
dpi = 96
|
||||||
|
fbase = 18
|
||||||
|
fsizes = [14, 14, 16, 18, 20, 22, 24, 26]
|
||||||
|
|
||||||
|
|
||||||
|
class HanlinV3Input(InputProfile):
|
||||||
|
|
||||||
|
name = 'Hanlin V3'
|
||||||
|
short_name = 'hanlinv3'
|
||||||
|
description = _('This profile is intended for the Hanlin V3 and its clones.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (584, 754)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class HanlinV5Input(HanlinV3Input):
|
||||||
|
|
||||||
|
name = 'Hanlin V5'
|
||||||
|
short_name = 'hanlinv5'
|
||||||
|
description = _('This profile is intended for the Hanlin V5 and its clones.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (584, 754)
|
||||||
|
dpi = 200
|
||||||
|
|
||||||
|
|
||||||
|
class CybookG3Input(InputProfile):
|
||||||
|
|
||||||
|
name = 'Cybook G3'
|
||||||
|
short_name = 'cybookg3'
|
||||||
|
description = _('This profile is intended for the Cybook G3.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (600, 800)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class CybookOpusInput(InputProfile):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Cybook Opus'
|
||||||
|
short_name = 'cybook_opus'
|
||||||
|
description = _('This profile is intended for the Cybook Opus.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (600, 800)
|
||||||
|
dpi = 200
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class KindleInput(InputProfile):
|
||||||
|
|
||||||
|
name = 'Kindle'
|
||||||
|
short_name = 'kindle'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (525, 640)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class IlliadInput(InputProfile):
|
||||||
|
|
||||||
|
name = 'Illiad'
|
||||||
|
short_name = 'illiad'
|
||||||
|
description = _('This profile is intended for the Irex Illiad.')
|
||||||
|
|
||||||
|
screen_size = (760, 925)
|
||||||
|
dpi = 160.0
|
||||||
|
fbase = 12
|
||||||
|
fsizes = [7.5, 9, 10, 12, 15.5, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class IRexDR1000Input(InputProfile):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'IRex Digital Reader 1000'
|
||||||
|
short_name = 'irexdr1000'
|
||||||
|
description = _('This profile is intended for the IRex Digital Reader 1000.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (1024, 1280)
|
||||||
|
dpi = 160
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class IRexDR800Input(InputProfile):
|
||||||
|
|
||||||
|
author = 'Eric Cronin'
|
||||||
|
name = 'IRex Digital Reader 800'
|
||||||
|
short_name = 'irexdr800'
|
||||||
|
description = _('This profile is intended for the IRex Digital Reader 800.')
|
||||||
|
|
||||||
|
screen_size = (768, 1024)
|
||||||
|
dpi = 160
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class NookInput(InputProfile):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Nook'
|
||||||
|
short_name = 'nook'
|
||||||
|
description = _('This profile is intended for the B&N Nook.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (600, 800)
|
||||||
|
dpi = 167
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
input_profiles = [InputProfile, SonyReaderInput, SonyReader300Input,
|
||||||
|
SonyReader900Input, MSReaderInput, MobipocketInput, HanlinV3Input,
|
||||||
|
HanlinV5Input, CybookG3Input, CybookOpusInput, KindleInput, IlliadInput,
|
||||||
|
IRexDR1000Input, IRexDR800Input, NookInput]
|
||||||
|
|
||||||
|
input_profiles.sort(key=lambda x: x.name.lower())
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class OutputProfile(Plugin):
|
||||||
|
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
supported_platforms = {'windows', 'osx', 'linux'}
|
||||||
|
can_be_disabled = False
|
||||||
|
type = _('Output profile')
|
||||||
|
|
||||||
|
name = 'Default Output Profile'
|
||||||
|
short_name = 'default' # Used in the CLI so dont use spaces etc. in it
|
||||||
|
description = _('This profile tries to provide sane defaults and is useful '
|
||||||
|
'if you want to produce a document intended to be read at a '
|
||||||
|
'computer or on a range of devices.')
|
||||||
|
|
||||||
|
#: The image size for comics
|
||||||
|
comic_screen_size = (584, 754)
|
||||||
|
|
||||||
|
#: If True the MOBI renderer on the device supports MOBI indexing
|
||||||
|
supports_mobi_indexing = False
|
||||||
|
|
||||||
|
#: If True output should be optimized for a touchscreen interface
|
||||||
|
touchscreen = False
|
||||||
|
touchscreen_news_css = ''
|
||||||
|
#: A list of extra (beyond CSS 2.1) modules supported by the device
|
||||||
|
#: Format is a css_parser profile dictionary (see iPad for example)
|
||||||
|
extra_css_modules = []
|
||||||
|
#: If True, the date is appended to the title of downloaded news
|
||||||
|
periodical_date_in_title = True
|
||||||
|
|
||||||
|
#: Characters used in jackets and catalogs
|
||||||
|
ratings_char = '*'
|
||||||
|
empty_ratings_char = ' '
|
||||||
|
|
||||||
|
#: Unsupported unicode characters to be replaced during preprocessing
|
||||||
|
unsupported_unicode_chars = []
|
||||||
|
|
||||||
|
#: Number of ems that the left margin of a blockquote is rendered as
|
||||||
|
mobi_ems_per_blockquote = 1.0
|
||||||
|
|
||||||
|
#: Special periodical formatting needed in EPUB
|
||||||
|
epub_periodical_format = None
|
||||||
|
|
||||||
|
|
||||||
|
class iPadOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'iPad'
|
||||||
|
short_name = 'ipad'
|
||||||
|
description = _('Intended for the iPad and similar devices with a '
|
||||||
|
'resolution of 768x1024')
|
||||||
|
screen_size = (768, 1024)
|
||||||
|
comic_screen_size = (768, 1024)
|
||||||
|
dpi = 132.0
|
||||||
|
extra_css_modules = [
|
||||||
|
{
|
||||||
|
'name':'webkit',
|
||||||
|
'props': {'-webkit-border-bottom-left-radius':'{length}',
|
||||||
|
'-webkit-border-bottom-right-radius':'{length}',
|
||||||
|
'-webkit-border-top-left-radius':'{length}',
|
||||||
|
'-webkit-border-top-right-radius':'{length}',
|
||||||
|
'-webkit-border-radius': r'{border-width}(\s+{border-width}){0,3}|inherit',
|
||||||
|
},
|
||||||
|
'macros': {'border-width': '{length}|medium|thick|thin'}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
|
||||||
|
ratings_char = '\u2605' # filled star
|
||||||
|
empty_ratings_char = '\u2606' # hollow star
|
||||||
|
|
||||||
|
touchscreen = True
|
||||||
|
# touchscreen_news_css {{{
|
||||||
|
touchscreen_news_css = '''
|
||||||
|
/* hr used in articles */
|
||||||
|
.article_articles_list {
|
||||||
|
width:18%;
|
||||||
|
}
|
||||||
|
.article_link {
|
||||||
|
color: #593f29;
|
||||||
|
font-style: italic;
|
||||||
|
}
|
||||||
|
.article_next {
|
||||||
|
-webkit-border-top-right-radius:4px;
|
||||||
|
-webkit-border-bottom-right-radius:4px;
|
||||||
|
font-style: italic;
|
||||||
|
width:32%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.article_prev {
|
||||||
|
-webkit-border-top-left-radius:4px;
|
||||||
|
-webkit-border-bottom-left-radius:4px;
|
||||||
|
font-style: italic;
|
||||||
|
width:32%;
|
||||||
|
}
|
||||||
|
.article_sections_list {
|
||||||
|
width:18%;
|
||||||
|
}
|
||||||
|
.articles_link {
|
||||||
|
font-weight: bold;
|
||||||
|
}
|
||||||
|
.sections_link {
|
||||||
|
font-weight: bold;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
.caption_divider {
|
||||||
|
border:#ccc 1px solid;
|
||||||
|
}
|
||||||
|
|
||||||
|
.touchscreen_navbar {
|
||||||
|
background:#c3bab2;
|
||||||
|
border:#ccc 0px solid;
|
||||||
|
border-collapse:separate;
|
||||||
|
border-spacing:1px;
|
||||||
|
margin-left: 5%;
|
||||||
|
margin-right: 5%;
|
||||||
|
page-break-inside:avoid;
|
||||||
|
width: 90%;
|
||||||
|
-webkit-border-radius:4px;
|
||||||
|
}
|
||||||
|
.touchscreen_navbar td {
|
||||||
|
background:#fff;
|
||||||
|
font-family:Helvetica;
|
||||||
|
font-size:80%;
|
||||||
|
/* UI touchboxes use 8px padding */
|
||||||
|
padding: 6px;
|
||||||
|
text-align:center;
|
||||||
|
}
|
||||||
|
|
||||||
|
.touchscreen_navbar td a:link {
|
||||||
|
color: #593f29;
|
||||||
|
text-decoration: none;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Index formatting */
|
||||||
|
.publish_date {
|
||||||
|
text-align:center;
|
||||||
|
}
|
||||||
|
.divider {
|
||||||
|
border-bottom:1em solid white;
|
||||||
|
border-top:1px solid gray;
|
||||||
|
}
|
||||||
|
|
||||||
|
hr.caption_divider {
|
||||||
|
border-color:black;
|
||||||
|
border-style:solid;
|
||||||
|
border-width:1px;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Feed summary formatting */
|
||||||
|
.article_summary {
|
||||||
|
display:inline-block;
|
||||||
|
padding-bottom:0.5em;
|
||||||
|
}
|
||||||
|
.feed {
|
||||||
|
font-family:sans-serif;
|
||||||
|
font-weight:bold;
|
||||||
|
font-size:larger;
|
||||||
|
}
|
||||||
|
|
||||||
|
.feed_link {
|
||||||
|
font-style: italic;
|
||||||
|
}
|
||||||
|
|
||||||
|
.feed_next {
|
||||||
|
-webkit-border-top-right-radius:4px;
|
||||||
|
-webkit-border-bottom-right-radius:4px;
|
||||||
|
font-style: italic;
|
||||||
|
width:40%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.feed_prev {
|
||||||
|
-webkit-border-top-left-radius:4px;
|
||||||
|
-webkit-border-bottom-left-radius:4px;
|
||||||
|
font-style: italic;
|
||||||
|
width:40%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.feed_title {
|
||||||
|
text-align: center;
|
||||||
|
font-size: 160%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.feed_up {
|
||||||
|
font-weight: bold;
|
||||||
|
width:20%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.summary_headline {
|
||||||
|
font-weight:bold;
|
||||||
|
text-align:left;
|
||||||
|
}
|
||||||
|
|
||||||
|
.summary_byline {
|
||||||
|
text-align:left;
|
||||||
|
font-family:monospace;
|
||||||
|
}
|
||||||
|
|
||||||
|
.summary_text {
|
||||||
|
text-align:left;
|
||||||
|
}
|
||||||
|
|
||||||
|
'''
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class iPad3Output(iPadOutput):
|
||||||
|
|
||||||
|
screen_size = comic_screen_size = (2048, 1536)
|
||||||
|
dpi = 264.0
|
||||||
|
name = 'iPad 3'
|
||||||
|
short_name = 'ipad3'
|
||||||
|
description = _('Intended for the iPad 3 and similar devices with a '
|
||||||
|
'resolution of 1536x2048')
|
||||||
|
|
||||||
|
|
||||||
|
class TabletOutput(iPadOutput):
|
||||||
|
name = 'Tablet'
|
||||||
|
short_name = 'tablet'
|
||||||
|
description = _('Intended for generic tablet devices, does no resizing of images')
|
||||||
|
|
||||||
|
screen_size = (10000, 10000)
|
||||||
|
comic_screen_size = (10000, 10000)
|
||||||
|
|
||||||
|
|
||||||
|
class SamsungGalaxy(TabletOutput):
|
||||||
|
name = 'Samsung Galaxy'
|
||||||
|
short_name = 'galaxy'
|
||||||
|
description = _('Intended for the Samsung Galaxy and similar tablet devices with '
|
||||||
|
'a resolution of 600x1280')
|
||||||
|
screen_size = comic_screen_size = (600, 1280)
|
||||||
|
|
||||||
|
|
||||||
|
class NookHD(TabletOutput):
|
||||||
|
name = 'Nook HD+'
|
||||||
|
short_name = 'nook_hd_plus'
|
||||||
|
description = _('Intended for the Nook HD+ and similar tablet devices with '
|
||||||
|
'a resolution of 1280x1920')
|
||||||
|
screen_size = comic_screen_size = (1280, 1920)
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReaderOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Sony Reader'
|
||||||
|
short_name = 'sony'
|
||||||
|
description = _('This profile is intended for the SONY PRS line. '
|
||||||
|
'The 500/505/600/700 etc.')
|
||||||
|
|
||||||
|
screen_size = (590, 775)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 12
|
||||||
|
fsizes = [7.5, 9, 10, 12, 15.5, 20, 22, 24]
|
||||||
|
unsupported_unicode_chars = [u'\u201f', u'\u201b']
|
||||||
|
|
||||||
|
epub_periodical_format = 'sony'
|
||||||
|
# periodical_date_in_title = False
|
||||||
|
|
||||||
|
|
||||||
|
class KoboReaderOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Kobo Reader'
|
||||||
|
short_name = 'kobo'
|
||||||
|
|
||||||
|
description = _('This profile is intended for the Kobo Reader.')
|
||||||
|
|
||||||
|
screen_size = (536, 710)
|
||||||
|
comic_screen_size = (536, 710)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 12
|
||||||
|
fsizes = [7.5, 9, 10, 12, 15.5, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReader300Output(SonyReaderOutput):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Sony Reader 300'
|
||||||
|
short_name = 'sony300'
|
||||||
|
description = _('This profile is intended for the SONY PRS-300.')
|
||||||
|
|
||||||
|
dpi = 200
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReader900Output(SonyReaderOutput):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Sony Reader 900'
|
||||||
|
short_name = 'sony900'
|
||||||
|
description = _('This profile is intended for the SONY PRS-900.')
|
||||||
|
|
||||||
|
screen_size = (600, 999)
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReaderT3Output(SonyReaderOutput):
|
||||||
|
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
name = 'Sony Reader T3'
|
||||||
|
short_name = 'sonyt3'
|
||||||
|
description = _('This profile is intended for the SONY PRS-T3.')
|
||||||
|
|
||||||
|
screen_size = (758, 934)
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class GenericEink(SonyReaderOutput):
|
||||||
|
|
||||||
|
name = 'Generic e-ink'
|
||||||
|
short_name = 'generic_eink'
|
||||||
|
description = _('Suitable for use with any e-ink device')
|
||||||
|
epub_periodical_format = None
|
||||||
|
|
||||||
|
|
||||||
|
class GenericEinkLarge(GenericEink):
|
||||||
|
|
||||||
|
name = 'Generic e-ink large'
|
||||||
|
short_name = 'generic_eink_large'
|
||||||
|
description = _('Suitable for use with any large screen e-ink device')
|
||||||
|
|
||||||
|
screen_size = (600, 999)
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class GenericEinkHD(GenericEink):
|
||||||
|
|
||||||
|
name = 'Generic e-ink HD'
|
||||||
|
short_name = 'generic_eink_hd'
|
||||||
|
description = _('Suitable for use with any modern high resolution e-ink device')
|
||||||
|
|
||||||
|
screen_size = (10000, 10000)
|
||||||
|
comic_screen_size = (10000, 10000)
|
||||||
|
|
||||||
|
|
||||||
|
class JetBook5Output(OutputProfile):
|
||||||
|
|
||||||
|
name = 'JetBook 5-inch'
|
||||||
|
short_name = 'jetbook5'
|
||||||
|
description = _('This profile is intended for the 5-inch JetBook.')
|
||||||
|
|
||||||
|
screen_size = (480, 640)
|
||||||
|
dpi = 168.451
|
||||||
|
|
||||||
|
|
||||||
|
class SonyReaderLandscapeOutput(SonyReaderOutput):
|
||||||
|
|
||||||
|
name = 'Sony Reader Landscape'
|
||||||
|
short_name = 'sony-landscape'
|
||||||
|
description = _('This profile is intended for the SONY PRS line. '
|
||||||
|
'The 500/505/700 etc, in landscape mode. Mainly useful '
|
||||||
|
'for comics.')
|
||||||
|
|
||||||
|
screen_size = (784, 1012)
|
||||||
|
comic_screen_size = (784, 1012)
|
||||||
|
|
||||||
|
|
||||||
|
class MSReaderOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Microsoft Reader'
|
||||||
|
short_name = 'msreader'
|
||||||
|
description = _('This profile is intended for the Microsoft Reader.')
|
||||||
|
|
||||||
|
screen_size = (480, 652)
|
||||||
|
dpi = 96
|
||||||
|
fbase = 13
|
||||||
|
fsizes = [10, 11, 13, 16, 18, 20, 22, 26]
|
||||||
|
|
||||||
|
|
||||||
|
class MobipocketOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Mobipocket Books'
|
||||||
|
short_name = 'mobipocket'
|
||||||
|
description = _('This profile is intended for the Mobipocket books.')
|
||||||
|
|
||||||
|
# Unfortunately MOBI books are not narrowly targeted, so this information is
|
||||||
|
# quite likely to be spurious
|
||||||
|
screen_size = (600, 800)
|
||||||
|
dpi = 96
|
||||||
|
fbase = 18
|
||||||
|
fsizes = [14, 14, 16, 18, 20, 22, 24, 26]
|
||||||
|
|
||||||
|
|
||||||
|
class HanlinV3Output(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Hanlin V3'
|
||||||
|
short_name = 'hanlinv3'
|
||||||
|
description = _('This profile is intended for the Hanlin V3 and its clones.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (584, 754)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class HanlinV5Output(HanlinV3Output):
|
||||||
|
|
||||||
|
name = 'Hanlin V5'
|
||||||
|
short_name = 'hanlinv5'
|
||||||
|
description = _('This profile is intended for the Hanlin V5 and its clones.')
|
||||||
|
|
||||||
|
dpi = 200
|
||||||
|
|
||||||
|
|
||||||
|
class CybookG3Output(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Cybook G3'
|
||||||
|
short_name = 'cybookg3'
|
||||||
|
description = _('This profile is intended for the Cybook G3.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (600, 800)
|
||||||
|
comic_screen_size = (600, 757)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class CybookOpusOutput(SonyReaderOutput):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Cybook Opus'
|
||||||
|
short_name = 'cybook_opus'
|
||||||
|
description = _('This profile is intended for the Cybook Opus.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
dpi = 200
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
epub_periodical_format = None
|
||||||
|
|
||||||
|
|
||||||
|
class KindleOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Kindle'
|
||||||
|
short_name = 'kindle'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (525, 640)
|
||||||
|
dpi = 168.451
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
supports_mobi_indexing = True
|
||||||
|
periodical_date_in_title = False
|
||||||
|
|
||||||
|
empty_ratings_char = '\u2606'
|
||||||
|
ratings_char = '\u2605'
|
||||||
|
|
||||||
|
mobi_ems_per_blockquote = 2.0
|
||||||
|
|
||||||
|
|
||||||
|
class KindleDXOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Kindle DX'
|
||||||
|
short_name = 'kindle_dx'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle DX.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (744, 1022)
|
||||||
|
dpi = 150.0
|
||||||
|
comic_screen_size = (771, 1116)
|
||||||
|
# comic_screen_size = (741, 1022)
|
||||||
|
supports_mobi_indexing = True
|
||||||
|
periodical_date_in_title = False
|
||||||
|
empty_ratings_char = '\u2606'
|
||||||
|
ratings_char = '\u2605'
|
||||||
|
mobi_ems_per_blockquote = 2.0
|
||||||
|
|
||||||
|
|
||||||
|
class KindlePaperWhiteOutput(KindleOutput):
|
||||||
|
|
||||||
|
name = 'Kindle PaperWhite'
|
||||||
|
short_name = 'kindle_pw'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle PaperWhite 1 and 2')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (658, 940)
|
||||||
|
dpi = 212.0
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class KindleVoyageOutput(KindleOutput):
|
||||||
|
|
||||||
|
name = 'Kindle Voyage'
|
||||||
|
short_name = 'kindle_voyage'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle Voyage')
|
||||||
|
|
||||||
|
# Screen size is currently just the spec size, actual renderable area will
|
||||||
|
# depend on someone with the device doing tests.
|
||||||
|
screen_size = (1080, 1430)
|
||||||
|
dpi = 300.0
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class KindlePaperWhite3Output(KindleVoyageOutput):
|
||||||
|
|
||||||
|
name = 'Kindle PaperWhite 3'
|
||||||
|
short_name = 'kindle_pw3'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle PaperWhite 3 and above')
|
||||||
|
# Screen size is currently just the spec size, actual renderable area will
|
||||||
|
# depend on someone with the device doing tests.
|
||||||
|
screen_size = (1072, 1430)
|
||||||
|
dpi = 300.0
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class KindleOasisOutput(KindlePaperWhite3Output):
|
||||||
|
|
||||||
|
name = 'Kindle Oasis'
|
||||||
|
short_name = 'kindle_oasis'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle Oasis 2017 and above')
|
||||||
|
# Screen size is currently just the spec size, actual renderable area will
|
||||||
|
# depend on someone with the device doing tests.
|
||||||
|
screen_size = (1264, 1680)
|
||||||
|
dpi = 300.0
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class KindleFireOutput(KindleDXOutput):
|
||||||
|
|
||||||
|
name = 'Kindle Fire'
|
||||||
|
short_name = 'kindle_fire'
|
||||||
|
description = _('This profile is intended for the Amazon Kindle Fire.')
|
||||||
|
|
||||||
|
screen_size = (570, 1016)
|
||||||
|
dpi = 169.0
|
||||||
|
comic_screen_size = (570, 1016)
|
||||||
|
|
||||||
|
|
||||||
|
class IlliadOutput(OutputProfile):
|
||||||
|
|
||||||
|
name = 'Illiad'
|
||||||
|
short_name = 'illiad'
|
||||||
|
description = _('This profile is intended for the Irex Illiad.')
|
||||||
|
|
||||||
|
screen_size = (760, 925)
|
||||||
|
comic_screen_size = (760, 925)
|
||||||
|
dpi = 160.0
|
||||||
|
fbase = 12
|
||||||
|
fsizes = [7.5, 9, 10, 12, 15.5, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class IRexDR1000Output(OutputProfile):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'IRex Digital Reader 1000'
|
||||||
|
short_name = 'irexdr1000'
|
||||||
|
description = _('This profile is intended for the IRex Digital Reader 1000.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (1024, 1280)
|
||||||
|
comic_screen_size = (996, 1241)
|
||||||
|
dpi = 160
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class IRexDR800Output(OutputProfile):
|
||||||
|
|
||||||
|
author = 'Eric Cronin'
|
||||||
|
name = 'IRex Digital Reader 800'
|
||||||
|
short_name = 'irexdr800'
|
||||||
|
description = _('This profile is intended for the IRex Digital Reader 800.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (768, 1024)
|
||||||
|
comic_screen_size = (768, 1024)
|
||||||
|
dpi = 160
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class NookOutput(OutputProfile):
|
||||||
|
|
||||||
|
author = 'John Schember'
|
||||||
|
name = 'Nook'
|
||||||
|
short_name = 'nook'
|
||||||
|
description = _('This profile is intended for the B&N Nook.')
|
||||||
|
|
||||||
|
# Screen size is a best guess
|
||||||
|
screen_size = (600, 730)
|
||||||
|
comic_screen_size = (584, 730)
|
||||||
|
dpi = 167
|
||||||
|
fbase = 16
|
||||||
|
fsizes = [12, 12, 14, 16, 18, 20, 22, 24]
|
||||||
|
|
||||||
|
|
||||||
|
class NookColorOutput(NookOutput):
|
||||||
|
name = 'Nook Color'
|
||||||
|
short_name = 'nook_color'
|
||||||
|
description = _('This profile is intended for the B&N Nook Color.')
|
||||||
|
|
||||||
|
screen_size = (600, 900)
|
||||||
|
comic_screen_size = (594, 900)
|
||||||
|
dpi = 169
|
||||||
|
|
||||||
|
|
||||||
|
class PocketBook900Output(OutputProfile):
|
||||||
|
|
||||||
|
author = 'Chris Lockfort'
|
||||||
|
name = 'PocketBook Pro 900'
|
||||||
|
short_name = 'pocketbook_900'
|
||||||
|
description = _('This profile is intended for the PocketBook Pro 900 series of devices.')
|
||||||
|
|
||||||
|
screen_size = (810, 1180)
|
||||||
|
dpi = 150.0
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
class PocketBookPro912Output(OutputProfile):
|
||||||
|
|
||||||
|
author = 'Daniele Pizzolli'
|
||||||
|
name = 'PocketBook Pro 912'
|
||||||
|
short_name = 'pocketbook_pro_912'
|
||||||
|
description = _('This profile is intended for the PocketBook Pro 912 series of devices.')
|
||||||
|
|
||||||
|
# According to http://download.pocketbook-int.com/user-guides/E_Ink/912/User_Guide_PocketBook_912(EN).pdf
|
||||||
|
screen_size = (825, 1200)
|
||||||
|
dpi = 155.0
|
||||||
|
comic_screen_size = screen_size
|
||||||
|
|
||||||
|
|
||||||
|
output_profiles = [
|
||||||
|
OutputProfile, SonyReaderOutput, SonyReader300Output, SonyReader900Output,
|
||||||
|
SonyReaderT3Output, MSReaderOutput, MobipocketOutput, HanlinV3Output,
|
||||||
|
HanlinV5Output, CybookG3Output, CybookOpusOutput, KindleOutput, iPadOutput,
|
||||||
|
iPad3Output, KoboReaderOutput, TabletOutput, SamsungGalaxy,
|
||||||
|
SonyReaderLandscapeOutput, KindleDXOutput, IlliadOutput, NookHD,
|
||||||
|
IRexDR1000Output, IRexDR800Output, JetBook5Output, NookOutput,
|
||||||
|
NookColorOutput, PocketBook900Output,
|
||||||
|
PocketBookPro912Output, GenericEink, GenericEinkLarge, GenericEinkHD,
|
||||||
|
KindleFireOutput, KindlePaperWhiteOutput, KindleVoyageOutput,
|
||||||
|
KindlePaperWhite3Output, KindleOasisOutput
|
||||||
|
]
|
||||||
|
|
||||||
|
output_profiles.sort(key=lambda x: x.name.lower())
|
||||||
835
ebook_converter/customize/ui.py
Normal file
835
ebook_converter/customize/ui.py
Normal file
@@ -0,0 +1,835 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os, shutil, traceback, functools, sys
|
||||||
|
from collections import defaultdict
|
||||||
|
from itertools import chain
|
||||||
|
|
||||||
|
from calibre.customize import (CatalogPlugin, FileTypePlugin, PluginNotFound,
|
||||||
|
MetadataReaderPlugin, MetadataWriterPlugin,
|
||||||
|
InterfaceActionBase as InterfaceAction,
|
||||||
|
PreferencesPlugin, platform, InvalidPlugin,
|
||||||
|
StoreBase as Store, EditBookToolPlugin,
|
||||||
|
LibraryClosedPlugin)
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OutputFormatPlugin
|
||||||
|
from calibre.customize.zipplugin import loader
|
||||||
|
from calibre.customize.profiles import InputProfile, OutputProfile
|
||||||
|
from calibre.customize.builtins import plugins as builtin_plugins
|
||||||
|
from calibre.devices.interface import DevicePlugin
|
||||||
|
from calibre.ebooks.metadata import MetaInformation
|
||||||
|
from calibre.utils.config import (make_config_dir, Config, ConfigProxy,
|
||||||
|
plugin_dir, OptionParser)
|
||||||
|
from calibre.ebooks.metadata.sources.base import Source
|
||||||
|
from calibre.constants import DEBUG, numeric_version
|
||||||
|
from polyglot.builtins import iteritems, itervalues, unicode_type
|
||||||
|
|
||||||
|
builtin_names = frozenset(p.name for p in builtin_plugins)
|
||||||
|
BLACKLISTED_PLUGINS = frozenset({'Marvin XD', 'iOS reader applications'})
|
||||||
|
|
||||||
|
|
||||||
|
class NameConflict(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def _config():
|
||||||
|
c = Config('customize')
|
||||||
|
c.add_opt('plugins', default={}, help=_('Installed plugins'))
|
||||||
|
c.add_opt('filetype_mapping', default={}, help=_('Mapping for filetype plugins'))
|
||||||
|
c.add_opt('plugin_customization', default={}, help=_('Local plugin customization'))
|
||||||
|
c.add_opt('disabled_plugins', default=set(), help=_('Disabled plugins'))
|
||||||
|
c.add_opt('enabled_plugins', default=set(), help=_('Enabled plugins'))
|
||||||
|
|
||||||
|
return ConfigProxy(c)
|
||||||
|
|
||||||
|
|
||||||
|
config = _config()
|
||||||
|
|
||||||
|
|
||||||
|
def find_plugin(name):
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if plugin.name == name:
|
||||||
|
return plugin
|
||||||
|
|
||||||
|
|
||||||
|
def load_plugin(path_to_zip_file): # {{{
|
||||||
|
'''
|
||||||
|
Load plugin from ZIP file or raise InvalidPlugin error
|
||||||
|
|
||||||
|
:return: A :class:`Plugin` instance.
|
||||||
|
'''
|
||||||
|
return loader.load(path_to_zip_file)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Enable/disable plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
def disable_plugin(plugin_or_name):
|
||||||
|
x = getattr(plugin_or_name, 'name', plugin_or_name)
|
||||||
|
plugin = find_plugin(x)
|
||||||
|
if not plugin.can_be_disabled:
|
||||||
|
raise ValueError('Plugin %s cannot be disabled'%x)
|
||||||
|
dp = config['disabled_plugins']
|
||||||
|
dp.add(x)
|
||||||
|
config['disabled_plugins'] = dp
|
||||||
|
ep = config['enabled_plugins']
|
||||||
|
if x in ep:
|
||||||
|
ep.remove(x)
|
||||||
|
config['enabled_plugins'] = ep
|
||||||
|
|
||||||
|
|
||||||
|
def enable_plugin(plugin_or_name):
|
||||||
|
x = getattr(plugin_or_name, 'name', plugin_or_name)
|
||||||
|
dp = config['disabled_plugins']
|
||||||
|
if x in dp:
|
||||||
|
dp.remove(x)
|
||||||
|
config['disabled_plugins'] = dp
|
||||||
|
ep = config['enabled_plugins']
|
||||||
|
ep.add(x)
|
||||||
|
config['enabled_plugins'] = ep
|
||||||
|
|
||||||
|
|
||||||
|
def restore_plugin_state_to_default(plugin_or_name):
|
||||||
|
x = getattr(plugin_or_name, 'name', plugin_or_name)
|
||||||
|
dp = config['disabled_plugins']
|
||||||
|
if x in dp:
|
||||||
|
dp.remove(x)
|
||||||
|
config['disabled_plugins'] = dp
|
||||||
|
ep = config['enabled_plugins']
|
||||||
|
if x in ep:
|
||||||
|
ep.remove(x)
|
||||||
|
config['enabled_plugins'] = ep
|
||||||
|
|
||||||
|
|
||||||
|
default_disabled_plugins = {
|
||||||
|
'Overdrive', 'Douban Books', 'OZON.ru', 'Edelweiss', 'Google Images', 'Big Book Search',
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def is_disabled(plugin):
|
||||||
|
if plugin.name in config['enabled_plugins']:
|
||||||
|
return False
|
||||||
|
return plugin.name in config['disabled_plugins'] or \
|
||||||
|
plugin.name in default_disabled_plugins
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# File type plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
_on_import = {}
|
||||||
|
_on_postimport = {}
|
||||||
|
_on_preprocess = {}
|
||||||
|
_on_postprocess = {}
|
||||||
|
_on_postadd = []
|
||||||
|
|
||||||
|
|
||||||
|
def reread_filetype_plugins():
|
||||||
|
global _on_import, _on_postimport, _on_preprocess, _on_postprocess, _on_postadd
|
||||||
|
_on_import = defaultdict(list)
|
||||||
|
_on_postimport = defaultdict(list)
|
||||||
|
_on_preprocess = defaultdict(list)
|
||||||
|
_on_postprocess = defaultdict(list)
|
||||||
|
_on_postadd = []
|
||||||
|
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, FileTypePlugin):
|
||||||
|
for ft in plugin.file_types:
|
||||||
|
if plugin.on_import:
|
||||||
|
_on_import[ft].append(plugin)
|
||||||
|
if plugin.on_postimport:
|
||||||
|
_on_postimport[ft].append(plugin)
|
||||||
|
_on_postadd.append(plugin)
|
||||||
|
if plugin.on_preprocess:
|
||||||
|
_on_preprocess[ft].append(plugin)
|
||||||
|
if plugin.on_postprocess:
|
||||||
|
_on_postprocess[ft].append(plugin)
|
||||||
|
|
||||||
|
|
||||||
|
def plugins_for_ft(ft, occasion):
|
||||||
|
op = {
|
||||||
|
'import':_on_import, 'preprocess':_on_preprocess, 'postprocess':_on_postprocess, 'postimport':_on_postimport,
|
||||||
|
}[occasion]
|
||||||
|
for p in chain(op.get(ft, ()), op.get('*', ())):
|
||||||
|
if not is_disabled(p):
|
||||||
|
yield p
|
||||||
|
|
||||||
|
|
||||||
|
def _run_filetype_plugins(path_to_file, ft=None, occasion='preprocess'):
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
if ft is None:
|
||||||
|
ft = os.path.splitext(path_to_file)[-1].lower().replace('.', '')
|
||||||
|
nfp = path_to_file
|
||||||
|
for plugin in plugins_for_ft(ft, occasion):
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
oo, oe = sys.stdout, sys.stderr # Some file type plugins out there override the output streams with buggy implementations
|
||||||
|
with plugin:
|
||||||
|
try:
|
||||||
|
plugin.original_path_to_file = path_to_file
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
nfp = plugin.run(nfp) or nfp
|
||||||
|
except:
|
||||||
|
print('Running file type plugin %s failed with traceback:'%plugin.name, file=oe)
|
||||||
|
traceback.print_exc(file=oe)
|
||||||
|
sys.stdout, sys.stderr = oo, oe
|
||||||
|
x = lambda j: os.path.normpath(os.path.normcase(j))
|
||||||
|
if occasion == 'postprocess' and x(nfp) != x(path_to_file):
|
||||||
|
shutil.copyfile(nfp, path_to_file)
|
||||||
|
nfp = path_to_file
|
||||||
|
return nfp
|
||||||
|
|
||||||
|
|
||||||
|
run_plugins_on_import = functools.partial(_run_filetype_plugins, occasion='import')
|
||||||
|
run_plugins_on_preprocess = functools.partial(_run_filetype_plugins, occasion='preprocess')
|
||||||
|
run_plugins_on_postprocess = functools.partial(_run_filetype_plugins, occasion='postprocess')
|
||||||
|
|
||||||
|
|
||||||
|
def run_plugins_on_postimport(db, book_id, fmt):
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
fmt = fmt.lower()
|
||||||
|
for plugin in plugins_for_ft(fmt, 'postimport'):
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
with plugin:
|
||||||
|
try:
|
||||||
|
plugin.postimport(book_id, fmt, db)
|
||||||
|
except:
|
||||||
|
print('Running file type plugin %s failed with traceback:'%
|
||||||
|
plugin.name)
|
||||||
|
traceback.print_exc()
|
||||||
|
|
||||||
|
|
||||||
|
def run_plugins_on_postadd(db, book_id, fmt_map):
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in _on_postadd:
|
||||||
|
if is_disabled(plugin):
|
||||||
|
continue
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
with plugin:
|
||||||
|
try:
|
||||||
|
plugin.postadd(book_id, fmt_map, db)
|
||||||
|
except Exception:
|
||||||
|
print('Running file type plugin %s failed with traceback:'%
|
||||||
|
plugin.name)
|
||||||
|
traceback.print_exc()
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Plugin customization {{{
|
||||||
|
|
||||||
|
|
||||||
|
def customize_plugin(plugin, custom):
|
||||||
|
d = config['plugin_customization']
|
||||||
|
d[plugin.name] = custom.strip()
|
||||||
|
config['plugin_customization'] = d
|
||||||
|
|
||||||
|
|
||||||
|
def plugin_customization(plugin):
|
||||||
|
return config['plugin_customization'].get(plugin.name, '')
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Input/Output profiles {{{
|
||||||
|
|
||||||
|
|
||||||
|
def input_profiles():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, InputProfile):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def output_profiles():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, OutputProfile):
|
||||||
|
yield plugin
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Interface Actions # {{{
|
||||||
|
|
||||||
|
|
||||||
|
def interface_actions():
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, InterfaceAction):
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
yield plugin
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Preferences Plugins # {{{
|
||||||
|
|
||||||
|
|
||||||
|
def preferences_plugins():
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, PreferencesPlugin):
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
yield plugin
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Library Closed Plugins # {{{
|
||||||
|
|
||||||
|
|
||||||
|
def available_library_closed_plugins():
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, LibraryClosedPlugin):
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def has_library_closed_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, LibraryClosedPlugin):
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Store Plugins # {{{
|
||||||
|
|
||||||
|
|
||||||
|
def store_plugins():
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, Store):
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def available_store_plugins():
|
||||||
|
for plugin in store_plugins():
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def stores():
|
||||||
|
stores = set()
|
||||||
|
for plugin in store_plugins():
|
||||||
|
stores.add(plugin.name)
|
||||||
|
return stores
|
||||||
|
|
||||||
|
|
||||||
|
def available_stores():
|
||||||
|
stores = set()
|
||||||
|
for plugin in available_store_plugins():
|
||||||
|
stores.add(plugin.name)
|
||||||
|
return stores
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Metadata read/write {{{
|
||||||
|
|
||||||
|
|
||||||
|
_metadata_readers = {}
|
||||||
|
_metadata_writers = {}
|
||||||
|
|
||||||
|
|
||||||
|
def reread_metadata_plugins():
|
||||||
|
global _metadata_readers
|
||||||
|
global _metadata_writers
|
||||||
|
_metadata_readers = defaultdict(list)
|
||||||
|
_metadata_writers = defaultdict(list)
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, MetadataReaderPlugin):
|
||||||
|
for ft in plugin.file_types:
|
||||||
|
_metadata_readers[ft].append(plugin)
|
||||||
|
elif isinstance(plugin, MetadataWriterPlugin):
|
||||||
|
for ft in plugin.file_types:
|
||||||
|
_metadata_writers[ft].append(plugin)
|
||||||
|
|
||||||
|
# Ensure custom metadata plugins are used in preference to builtin
|
||||||
|
# ones for a given filetype
|
||||||
|
def key(plugin):
|
||||||
|
return (1 if plugin.plugin_path is None else 0), plugin.name
|
||||||
|
|
||||||
|
for group in (_metadata_readers, _metadata_writers):
|
||||||
|
for plugins in itervalues(group):
|
||||||
|
if len(plugins) > 1:
|
||||||
|
plugins.sort(key=key)
|
||||||
|
|
||||||
|
|
||||||
|
def metadata_readers():
|
||||||
|
ans = set()
|
||||||
|
for plugins in _metadata_readers.values():
|
||||||
|
for plugin in plugins:
|
||||||
|
ans.add(plugin)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def metadata_writers():
|
||||||
|
ans = set()
|
||||||
|
for plugins in _metadata_writers.values():
|
||||||
|
for plugin in plugins:
|
||||||
|
ans.add(plugin)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
class QuickMetadata(object):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.quick = False
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
self.quick = True
|
||||||
|
|
||||||
|
def __exit__(self, *args):
|
||||||
|
self.quick = False
|
||||||
|
|
||||||
|
|
||||||
|
quick_metadata = QuickMetadata()
|
||||||
|
|
||||||
|
|
||||||
|
class ApplyNullMetadata(object):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.apply_null = False
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
self.apply_null = True
|
||||||
|
|
||||||
|
def __exit__(self, *args):
|
||||||
|
self.apply_null = False
|
||||||
|
|
||||||
|
|
||||||
|
apply_null_metadata = ApplyNullMetadata()
|
||||||
|
|
||||||
|
|
||||||
|
class ForceIdentifiers(object):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.force_identifiers = False
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
self.force_identifiers = True
|
||||||
|
|
||||||
|
def __exit__(self, *args):
|
||||||
|
self.force_identifiers = False
|
||||||
|
|
||||||
|
|
||||||
|
force_identifiers = ForceIdentifiers()
|
||||||
|
|
||||||
|
|
||||||
|
def get_file_type_metadata(stream, ftype):
|
||||||
|
mi = MetaInformation(None, None)
|
||||||
|
|
||||||
|
ftype = ftype.lower().strip()
|
||||||
|
if ftype in _metadata_readers:
|
||||||
|
for plugin in _metadata_readers[ftype]:
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
with plugin:
|
||||||
|
try:
|
||||||
|
plugin.quick = quick_metadata.quick
|
||||||
|
if hasattr(stream, 'seek'):
|
||||||
|
stream.seek(0)
|
||||||
|
mi = plugin.get_metadata(stream, ftype.lower().strip())
|
||||||
|
break
|
||||||
|
except:
|
||||||
|
traceback.print_exc()
|
||||||
|
continue
|
||||||
|
return mi
|
||||||
|
|
||||||
|
|
||||||
|
def set_file_type_metadata(stream, mi, ftype, report_error=None):
|
||||||
|
ftype = ftype.lower().strip()
|
||||||
|
if ftype in _metadata_writers:
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in _metadata_writers[ftype]:
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
with plugin:
|
||||||
|
try:
|
||||||
|
plugin.apply_null = apply_null_metadata.apply_null
|
||||||
|
plugin.force_identifiers = force_identifiers.force_identifiers
|
||||||
|
plugin.site_customization = customization.get(plugin.name, '')
|
||||||
|
plugin.set_metadata(stream, mi, ftype.lower().strip())
|
||||||
|
break
|
||||||
|
except:
|
||||||
|
if report_error is None:
|
||||||
|
from calibre import prints
|
||||||
|
prints('Failed to set metadata for the', ftype.upper(), 'format of:', getattr(mi, 'title', ''), file=sys.stderr)
|
||||||
|
traceback.print_exc()
|
||||||
|
else:
|
||||||
|
report_error(mi, ftype, traceback.format_exc())
|
||||||
|
|
||||||
|
|
||||||
|
def can_set_metadata(ftype):
|
||||||
|
ftype = ftype.lower().strip()
|
||||||
|
for plugin in _metadata_writers.get(ftype, ()):
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Add/remove plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
def add_plugin(path_to_zip_file):
|
||||||
|
make_config_dir()
|
||||||
|
plugin = load_plugin(path_to_zip_file)
|
||||||
|
if plugin.name in builtin_names:
|
||||||
|
raise NameConflict(
|
||||||
|
'A builtin plugin with the name %r already exists' % plugin.name)
|
||||||
|
plugin = initialize_plugin(plugin, path_to_zip_file)
|
||||||
|
plugins = config['plugins']
|
||||||
|
zfp = os.path.join(plugin_dir, plugin.name+'.zip')
|
||||||
|
if os.path.exists(zfp):
|
||||||
|
os.remove(zfp)
|
||||||
|
shutil.copyfile(path_to_zip_file, zfp)
|
||||||
|
plugins[plugin.name] = zfp
|
||||||
|
config['plugins'] = plugins
|
||||||
|
initialize_plugins()
|
||||||
|
return plugin
|
||||||
|
|
||||||
|
|
||||||
|
def remove_plugin(plugin_or_name):
|
||||||
|
name = getattr(plugin_or_name, 'name', plugin_or_name)
|
||||||
|
plugins = config['plugins']
|
||||||
|
removed = False
|
||||||
|
if name in plugins:
|
||||||
|
removed = True
|
||||||
|
try:
|
||||||
|
zfp = os.path.join(plugin_dir, name+'.zip')
|
||||||
|
if os.path.exists(zfp):
|
||||||
|
os.remove(zfp)
|
||||||
|
zfp = plugins[name]
|
||||||
|
if os.path.exists(zfp):
|
||||||
|
os.remove(zfp)
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
plugins.pop(name)
|
||||||
|
config['plugins'] = plugins
|
||||||
|
initialize_plugins()
|
||||||
|
return removed
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Input/Output format plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
def input_format_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, InputFormatPlugin):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def plugin_for_input_format(fmt):
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in input_format_plugins():
|
||||||
|
if fmt.lower() in plugin.file_types:
|
||||||
|
plugin.site_customization = customization.get(plugin.name, None)
|
||||||
|
return plugin
|
||||||
|
|
||||||
|
|
||||||
|
def all_input_formats():
|
||||||
|
formats = set()
|
||||||
|
for plugin in input_format_plugins():
|
||||||
|
for format in plugin.file_types:
|
||||||
|
formats.add(format)
|
||||||
|
return formats
|
||||||
|
|
||||||
|
|
||||||
|
def available_input_formats():
|
||||||
|
formats = set()
|
||||||
|
for plugin in input_format_plugins():
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
for format in plugin.file_types:
|
||||||
|
formats.add(format)
|
||||||
|
formats.add('zip'), formats.add('rar')
|
||||||
|
return formats
|
||||||
|
|
||||||
|
|
||||||
|
def output_format_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, OutputFormatPlugin):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def plugin_for_output_format(fmt):
|
||||||
|
customization = config['plugin_customization']
|
||||||
|
for plugin in output_format_plugins():
|
||||||
|
if fmt.lower() == plugin.file_type:
|
||||||
|
plugin.site_customization = customization.get(plugin.name, None)
|
||||||
|
return plugin
|
||||||
|
|
||||||
|
|
||||||
|
def available_output_formats():
|
||||||
|
formats = set()
|
||||||
|
for plugin in output_format_plugins():
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
formats.add(plugin.file_type)
|
||||||
|
return formats
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Catalog plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
def catalog_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, CatalogPlugin):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def available_catalog_formats():
|
||||||
|
formats = set()
|
||||||
|
for plugin in catalog_plugins():
|
||||||
|
if not is_disabled(plugin):
|
||||||
|
for format in plugin.file_types:
|
||||||
|
formats.add(format)
|
||||||
|
return formats
|
||||||
|
|
||||||
|
|
||||||
|
def plugin_for_catalog_format(fmt):
|
||||||
|
for plugin in catalog_plugins():
|
||||||
|
if fmt.lower() in plugin.file_types:
|
||||||
|
return plugin
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Device plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
def device_plugins(include_disabled=False):
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, DevicePlugin):
|
||||||
|
if include_disabled or not is_disabled(plugin):
|
||||||
|
if platform in plugin.supported_platforms:
|
||||||
|
if getattr(plugin, 'plugin_needs_delayed_initialization',
|
||||||
|
False):
|
||||||
|
plugin.do_delayed_plugin_initialization()
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def disabled_device_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, DevicePlugin):
|
||||||
|
if is_disabled(plugin):
|
||||||
|
if platform in plugin.supported_platforms:
|
||||||
|
yield plugin
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Metadata sources2 {{{
|
||||||
|
|
||||||
|
|
||||||
|
def metadata_plugins(capabilities):
|
||||||
|
capabilities = frozenset(capabilities)
|
||||||
|
for plugin in all_metadata_plugins():
|
||||||
|
if plugin.capabilities.intersection(capabilities) and \
|
||||||
|
not is_disabled(plugin):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def all_metadata_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, Source):
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
|
||||||
|
def patch_metadata_plugins(possibly_updated_plugins):
|
||||||
|
patches = {}
|
||||||
|
for i, plugin in enumerate(_initialized_plugins):
|
||||||
|
if isinstance(plugin, Source) and plugin.name in builtin_names:
|
||||||
|
pup = possibly_updated_plugins.get(plugin.name)
|
||||||
|
if pup is not None:
|
||||||
|
if pup.version > plugin.version and pup.minimum_calibre_version <= numeric_version:
|
||||||
|
patches[i] = pup(None)
|
||||||
|
# Metadata source plugins dont use initialize() but that
|
||||||
|
# might change in the future, so be safe.
|
||||||
|
patches[i].initialize()
|
||||||
|
for i, pup in iteritems(patches):
|
||||||
|
_initialized_plugins[i] = pup
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Editor plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
def all_edit_book_tool_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
if isinstance(plugin, EditBookToolPlugin):
|
||||||
|
yield plugin
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# Initialize plugins {{{
|
||||||
|
|
||||||
|
|
||||||
|
_initialized_plugins = []
|
||||||
|
|
||||||
|
|
||||||
|
def initialize_plugin(plugin, path_to_zip_file):
|
||||||
|
try:
|
||||||
|
p = plugin(path_to_zip_file)
|
||||||
|
p.initialize()
|
||||||
|
return p
|
||||||
|
except Exception:
|
||||||
|
print('Failed to initialize plugin:', plugin.name, plugin.version)
|
||||||
|
tb = traceback.format_exc()
|
||||||
|
raise InvalidPlugin((_('Initialization of plugin %s failed with traceback:')
|
||||||
|
%tb) + '\n'+tb)
|
||||||
|
|
||||||
|
|
||||||
|
def has_external_plugins():
|
||||||
|
'True if there are updateable (ZIP file based) plugins'
|
||||||
|
return bool(config['plugins'])
|
||||||
|
|
||||||
|
|
||||||
|
def initialize_plugins(perf=False):
|
||||||
|
global _initialized_plugins
|
||||||
|
_initialized_plugins = []
|
||||||
|
conflicts = [name for name in config['plugins'] if name in
|
||||||
|
builtin_names]
|
||||||
|
for p in conflicts:
|
||||||
|
remove_plugin(p)
|
||||||
|
external_plugins = config['plugins'].copy()
|
||||||
|
for name in BLACKLISTED_PLUGINS:
|
||||||
|
external_plugins.pop(name, None)
|
||||||
|
ostdout, ostderr = sys.stdout, sys.stderr
|
||||||
|
if perf:
|
||||||
|
from collections import defaultdict
|
||||||
|
import time
|
||||||
|
times = defaultdict(lambda:0)
|
||||||
|
for zfp in list(external_plugins) + builtin_plugins:
|
||||||
|
try:
|
||||||
|
if not isinstance(zfp, type):
|
||||||
|
# We have a plugin name
|
||||||
|
pname = zfp
|
||||||
|
zfp = os.path.join(plugin_dir, zfp+'.zip')
|
||||||
|
if not os.path.exists(zfp):
|
||||||
|
zfp = external_plugins[pname]
|
||||||
|
try:
|
||||||
|
plugin = load_plugin(zfp) if not isinstance(zfp, type) else zfp
|
||||||
|
except PluginNotFound:
|
||||||
|
continue
|
||||||
|
if perf:
|
||||||
|
st = time.time()
|
||||||
|
plugin = initialize_plugin(plugin, None if isinstance(zfp, type) else zfp)
|
||||||
|
if perf:
|
||||||
|
times[plugin.name] = time.time() - st
|
||||||
|
_initialized_plugins.append(plugin)
|
||||||
|
except:
|
||||||
|
print('Failed to initialize plugin:', repr(zfp))
|
||||||
|
if DEBUG:
|
||||||
|
traceback.print_exc()
|
||||||
|
# Prevent a custom plugin from overriding stdout/stderr as this breaks
|
||||||
|
# ipython
|
||||||
|
sys.stdout, sys.stderr = ostdout, ostderr
|
||||||
|
if perf:
|
||||||
|
for x in sorted(times, key=lambda x: times[x]):
|
||||||
|
print('%50s: %.3f'%(x, times[x]))
|
||||||
|
_initialized_plugins.sort(key=lambda x: x.priority, reverse=True)
|
||||||
|
reread_filetype_plugins()
|
||||||
|
reread_metadata_plugins()
|
||||||
|
|
||||||
|
|
||||||
|
initialize_plugins()
|
||||||
|
|
||||||
|
|
||||||
|
def initialized_plugins():
|
||||||
|
for plugin in _initialized_plugins:
|
||||||
|
yield plugin
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
# CLI {{{
|
||||||
|
|
||||||
|
|
||||||
|
def build_plugin(path):
|
||||||
|
from calibre import prints
|
||||||
|
from calibre.ptempfile import PersistentTemporaryFile
|
||||||
|
from calibre.utils.zipfile import ZipFile, ZIP_STORED
|
||||||
|
path = unicode_type(path)
|
||||||
|
names = frozenset(os.listdir(path))
|
||||||
|
if '__init__.py' not in names:
|
||||||
|
prints(path, ' is not a valid plugin')
|
||||||
|
raise SystemExit(1)
|
||||||
|
t = PersistentTemporaryFile(u'.zip')
|
||||||
|
with ZipFile(t, 'w', ZIP_STORED) as zf:
|
||||||
|
zf.add_dir(path, simple_filter=lambda x:x in {'.git', '.bzr', '.svn', '.hg'})
|
||||||
|
t.close()
|
||||||
|
plugin = add_plugin(t.name)
|
||||||
|
os.remove(t.name)
|
||||||
|
prints('Plugin updated:', plugin.name, plugin.version)
|
||||||
|
|
||||||
|
|
||||||
|
def option_parser():
|
||||||
|
parser = OptionParser(usage=_('''\
|
||||||
|
%prog options
|
||||||
|
|
||||||
|
Customize calibre by loading external plugins.
|
||||||
|
'''))
|
||||||
|
parser.add_option('-a', '--add-plugin', default=None,
|
||||||
|
help=_('Add a plugin by specifying the path to the ZIP file containing it.'))
|
||||||
|
parser.add_option('-b', '--build-plugin', default=None,
|
||||||
|
help=_('For plugin developers: Path to the directory where you are'
|
||||||
|
' developing the plugin. This command will automatically zip '
|
||||||
|
'up the plugin and update it in calibre.'))
|
||||||
|
parser.add_option('-r', '--remove-plugin', default=None,
|
||||||
|
help=_('Remove a custom plugin by name. Has no effect on builtin plugins'))
|
||||||
|
parser.add_option('--customize-plugin', default=None,
|
||||||
|
help=_('Customize plugin. Specify name of plugin and customization string separated by a comma.'))
|
||||||
|
parser.add_option('-l', '--list-plugins', default=False, action='store_true',
|
||||||
|
help=_('List all installed plugins'))
|
||||||
|
parser.add_option('--enable-plugin', default=None,
|
||||||
|
help=_('Enable the named plugin'))
|
||||||
|
parser.add_option('--disable-plugin', default=None,
|
||||||
|
help=_('Disable the named plugin'))
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
def main(args=sys.argv):
|
||||||
|
parser = option_parser()
|
||||||
|
if len(args) < 2:
|
||||||
|
parser.print_help()
|
||||||
|
return 1
|
||||||
|
opts, args = parser.parse_args(args)
|
||||||
|
if opts.add_plugin is not None:
|
||||||
|
plugin = add_plugin(opts.add_plugin)
|
||||||
|
print('Plugin added:', plugin.name, plugin.version)
|
||||||
|
if opts.build_plugin is not None:
|
||||||
|
build_plugin(opts.build_plugin)
|
||||||
|
if opts.remove_plugin is not None:
|
||||||
|
if remove_plugin(opts.remove_plugin):
|
||||||
|
print('Plugin removed')
|
||||||
|
else:
|
||||||
|
print('No custom plugin named', opts.remove_plugin)
|
||||||
|
if opts.customize_plugin is not None:
|
||||||
|
name, custom = opts.customize_plugin.split(',')
|
||||||
|
plugin = find_plugin(name.strip())
|
||||||
|
if plugin is None:
|
||||||
|
print('No plugin with the name %s exists'%name)
|
||||||
|
return 1
|
||||||
|
customize_plugin(plugin, custom)
|
||||||
|
if opts.enable_plugin is not None:
|
||||||
|
enable_plugin(opts.enable_plugin.strip())
|
||||||
|
if opts.disable_plugin is not None:
|
||||||
|
disable_plugin(opts.disable_plugin.strip())
|
||||||
|
if opts.list_plugins:
|
||||||
|
type_len = name_len = 0
|
||||||
|
for plugin in initialized_plugins():
|
||||||
|
type_len, name_len = max(type_len, len(plugin.type)), max(name_len, len(plugin.name))
|
||||||
|
fmt = '%-{}s%-{}s%-15s%-15s%s'.format(type_len+1, name_len+1)
|
||||||
|
print(fmt%tuple(('Type|Name|Version|Disabled|Site Customization'.split('|'))))
|
||||||
|
print()
|
||||||
|
for plugin in initialized_plugins():
|
||||||
|
print(fmt%(
|
||||||
|
plugin.type, plugin.name,
|
||||||
|
plugin.version, is_disabled(plugin),
|
||||||
|
plugin_customization(plugin)
|
||||||
|
))
|
||||||
|
print('\t', plugin.description)
|
||||||
|
if plugin.is_customizable():
|
||||||
|
try:
|
||||||
|
print('\t', plugin.customization_help())
|
||||||
|
except NotImplementedError:
|
||||||
|
pass
|
||||||
|
print()
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
|
# }}}
|
||||||
320
ebook_converter/customize/zipplugin.py
Normal file
320
ebook_converter/customize/zipplugin.py
Normal file
@@ -0,0 +1,320 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, zipfile, posixpath, importlib, threading, re, imp, sys
|
||||||
|
from collections import OrderedDict
|
||||||
|
from functools import partial
|
||||||
|
|
||||||
|
from calibre import as_unicode
|
||||||
|
from calibre.constants import ispy3
|
||||||
|
from calibre.customize import (Plugin, numeric_version, platform,
|
||||||
|
InvalidPlugin, PluginNotFound)
|
||||||
|
from polyglot.builtins import (itervalues, map, string_or_bytes,
|
||||||
|
unicode_type, reload)
|
||||||
|
|
||||||
|
# PEP 302 based plugin loading mechanism, works around the bug in zipimport in
|
||||||
|
# python 2.x that prevents importing from zip files in locations whose paths
|
||||||
|
# have non ASCII characters
|
||||||
|
|
||||||
|
|
||||||
|
def get_resources(zfp, name_or_list_of_names):
|
||||||
|
'''
|
||||||
|
Load resources from the plugin zip file
|
||||||
|
|
||||||
|
:param name_or_list_of_names: List of paths to resources in the zip file using / as
|
||||||
|
separator, or a single path
|
||||||
|
|
||||||
|
:return: A dictionary of the form ``{name : file_contents}``. Any names
|
||||||
|
that were not found in the zip file will not be present in the
|
||||||
|
dictionary. If a single path is passed in the return value will
|
||||||
|
be just the bytes of the resource or None if it wasn't found.
|
||||||
|
'''
|
||||||
|
names = name_or_list_of_names
|
||||||
|
if isinstance(names, string_or_bytes):
|
||||||
|
names = [names]
|
||||||
|
ans = {}
|
||||||
|
with zipfile.ZipFile(zfp) as zf:
|
||||||
|
for name in names:
|
||||||
|
try:
|
||||||
|
ans[name] = zf.read(name)
|
||||||
|
except:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
if len(names) == 1:
|
||||||
|
ans = ans.pop(names[0], None)
|
||||||
|
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def get_icons(zfp, name_or_list_of_names):
|
||||||
|
'''
|
||||||
|
Load icons from the plugin zip file
|
||||||
|
|
||||||
|
:param name_or_list_of_names: List of paths to resources in the zip file using / as
|
||||||
|
separator, or a single path
|
||||||
|
|
||||||
|
:return: A dictionary of the form ``{name : QIcon}``. Any names
|
||||||
|
that were not found in the zip file will be null QIcons.
|
||||||
|
If a single path is passed in the return value will
|
||||||
|
be A QIcon.
|
||||||
|
'''
|
||||||
|
from PyQt5.Qt import QIcon, QPixmap
|
||||||
|
names = name_or_list_of_names
|
||||||
|
ans = get_resources(zfp, names)
|
||||||
|
if isinstance(names, string_or_bytes):
|
||||||
|
names = [names]
|
||||||
|
if ans is None:
|
||||||
|
ans = {}
|
||||||
|
if isinstance(ans, string_or_bytes):
|
||||||
|
ans = dict([(names[0], ans)])
|
||||||
|
|
||||||
|
ians = {}
|
||||||
|
for name in names:
|
||||||
|
p = QPixmap()
|
||||||
|
raw = ans.get(name, None)
|
||||||
|
if raw:
|
||||||
|
p.loadFromData(raw)
|
||||||
|
ians[name] = QIcon(p)
|
||||||
|
if len(names) == 1:
|
||||||
|
ians = ians.pop(names[0])
|
||||||
|
return ians
|
||||||
|
|
||||||
|
|
||||||
|
_translations_cache = {}
|
||||||
|
|
||||||
|
|
||||||
|
def load_translations(namespace, zfp):
|
||||||
|
null = object()
|
||||||
|
trans = _translations_cache.get(zfp, null)
|
||||||
|
if trans is None:
|
||||||
|
return
|
||||||
|
if trans is null:
|
||||||
|
from calibre.utils.localization import get_lang
|
||||||
|
lang = get_lang()
|
||||||
|
if not lang or lang == 'en': # performance optimization
|
||||||
|
_translations_cache[zfp] = None
|
||||||
|
return
|
||||||
|
with zipfile.ZipFile(zfp) as zf:
|
||||||
|
try:
|
||||||
|
mo = zf.read('translations/%s.mo' % lang)
|
||||||
|
except KeyError:
|
||||||
|
mo = None # No translations for this language present
|
||||||
|
if mo is None:
|
||||||
|
_translations_cache[zfp] = None
|
||||||
|
return
|
||||||
|
from gettext import GNUTranslations
|
||||||
|
from io import BytesIO
|
||||||
|
trans = _translations_cache[zfp] = GNUTranslations(BytesIO(mo))
|
||||||
|
|
||||||
|
namespace['_'] = getattr(trans, 'gettext' if ispy3 else 'ugettext')
|
||||||
|
namespace['ngettext'] = getattr(trans, 'ngettext' if ispy3 else 'ungettext')
|
||||||
|
|
||||||
|
|
||||||
|
class PluginLoader(object):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.loaded_plugins = {}
|
||||||
|
self._lock = threading.RLock()
|
||||||
|
self._identifier_pat = re.compile(r'[a-zA-Z][_0-9a-zA-Z]*')
|
||||||
|
|
||||||
|
def _get_actual_fullname(self, fullname):
|
||||||
|
parts = fullname.split('.')
|
||||||
|
if parts[0] == 'calibre_plugins':
|
||||||
|
if len(parts) == 1:
|
||||||
|
return parts[0], None
|
||||||
|
plugin_name = parts[1]
|
||||||
|
with self._lock:
|
||||||
|
names = self.loaded_plugins.get(plugin_name, None)
|
||||||
|
if names is None:
|
||||||
|
raise ImportError('No plugin named %r loaded'%plugin_name)
|
||||||
|
names = names[1]
|
||||||
|
fullname = '.'.join(parts[2:])
|
||||||
|
if not fullname:
|
||||||
|
fullname = '__init__'
|
||||||
|
if fullname in names:
|
||||||
|
return fullname, plugin_name
|
||||||
|
if fullname+'.__init__' in names:
|
||||||
|
return fullname+'.__init__', plugin_name
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
def find_module(self, fullname, path=None):
|
||||||
|
fullname, plugin_name = self._get_actual_fullname(fullname)
|
||||||
|
if fullname is None and plugin_name is None:
|
||||||
|
return None
|
||||||
|
return self
|
||||||
|
|
||||||
|
def load_module(self, fullname):
|
||||||
|
import_name, plugin_name = self._get_actual_fullname(fullname)
|
||||||
|
if import_name is None and plugin_name is None:
|
||||||
|
raise ImportError('No plugin named %r is loaded'%fullname)
|
||||||
|
mod = sys.modules.setdefault(fullname, imp.new_module(fullname))
|
||||||
|
mod.__file__ = "<calibre Plugin Loader>"
|
||||||
|
mod.__loader__ = self
|
||||||
|
|
||||||
|
if import_name.endswith('.__init__') or import_name in ('__init__',
|
||||||
|
'calibre_plugins'):
|
||||||
|
# We have a package
|
||||||
|
mod.__path__ = []
|
||||||
|
|
||||||
|
if plugin_name is not None:
|
||||||
|
# We have some actual code to load
|
||||||
|
with self._lock:
|
||||||
|
zfp, names = self.loaded_plugins.get(plugin_name, (None, None))
|
||||||
|
if names is None:
|
||||||
|
raise ImportError('No plugin named %r loaded'%plugin_name)
|
||||||
|
zinfo = names.get(import_name, None)
|
||||||
|
if zinfo is None:
|
||||||
|
raise ImportError('Plugin %r has no module named %r' %
|
||||||
|
(plugin_name, import_name))
|
||||||
|
with zipfile.ZipFile(zfp) as zf:
|
||||||
|
try:
|
||||||
|
code = zf.read(zinfo)
|
||||||
|
except:
|
||||||
|
# Maybe the zip file changed from under us
|
||||||
|
code = zf.read(zinfo.filename)
|
||||||
|
compiled = compile(code, 'calibre_plugins.%s.%s'%(plugin_name,
|
||||||
|
import_name), 'exec', dont_inherit=True)
|
||||||
|
mod.__dict__['get_resources'] = partial(get_resources, zfp)
|
||||||
|
mod.__dict__['get_icons'] = partial(get_icons, zfp)
|
||||||
|
mod.__dict__['load_translations'] = partial(load_translations, mod.__dict__, zfp)
|
||||||
|
exec(compiled, mod.__dict__)
|
||||||
|
|
||||||
|
return mod
|
||||||
|
|
||||||
|
def load(self, path_to_zip_file):
|
||||||
|
if not os.access(path_to_zip_file, os.R_OK):
|
||||||
|
raise PluginNotFound('Cannot access %r'%path_to_zip_file)
|
||||||
|
|
||||||
|
with zipfile.ZipFile(path_to_zip_file) as zf:
|
||||||
|
plugin_name = self._locate_code(zf, path_to_zip_file)
|
||||||
|
|
||||||
|
try:
|
||||||
|
ans = None
|
||||||
|
plugin_module = 'calibre_plugins.%s'%plugin_name
|
||||||
|
m = sys.modules.get(plugin_module, None)
|
||||||
|
if m is not None:
|
||||||
|
reload(m)
|
||||||
|
else:
|
||||||
|
m = importlib.import_module(plugin_module)
|
||||||
|
plugin_classes = []
|
||||||
|
for obj in itervalues(m.__dict__):
|
||||||
|
if isinstance(obj, type) and issubclass(obj, Plugin) and \
|
||||||
|
obj.name != 'Trivial Plugin':
|
||||||
|
plugin_classes.append(obj)
|
||||||
|
if not plugin_classes:
|
||||||
|
raise InvalidPlugin('No plugin class found in %s:%s'%(
|
||||||
|
as_unicode(path_to_zip_file), plugin_name))
|
||||||
|
if len(plugin_classes) > 1:
|
||||||
|
plugin_classes.sort(key=lambda c:(getattr(c, '__module__', None) or '').count('.'))
|
||||||
|
|
||||||
|
ans = plugin_classes[0]
|
||||||
|
|
||||||
|
if ans.minimum_calibre_version > numeric_version:
|
||||||
|
raise InvalidPlugin(
|
||||||
|
'The plugin at %s needs a version of calibre >= %s' %
|
||||||
|
(as_unicode(path_to_zip_file), '.'.join(map(unicode_type,
|
||||||
|
ans.minimum_calibre_version))))
|
||||||
|
|
||||||
|
if platform not in ans.supported_platforms:
|
||||||
|
raise InvalidPlugin(
|
||||||
|
'The plugin at %s cannot be used on %s' %
|
||||||
|
(as_unicode(path_to_zip_file), platform))
|
||||||
|
|
||||||
|
return ans
|
||||||
|
except:
|
||||||
|
with self._lock:
|
||||||
|
del self.loaded_plugins[plugin_name]
|
||||||
|
raise
|
||||||
|
|
||||||
|
def _locate_code(self, zf, path_to_zip_file):
|
||||||
|
names = [x if isinstance(x, unicode_type) else x.decode('utf-8') for x in
|
||||||
|
zf.namelist()]
|
||||||
|
names = [x[1:] if x[0] == '/' else x for x in names]
|
||||||
|
|
||||||
|
plugin_name = None
|
||||||
|
for name in names:
|
||||||
|
name, ext = posixpath.splitext(name)
|
||||||
|
if name.startswith('plugin-import-name-') and ext == '.txt':
|
||||||
|
plugin_name = name.rpartition('-')[-1]
|
||||||
|
|
||||||
|
if plugin_name is None:
|
||||||
|
c = 0
|
||||||
|
while True:
|
||||||
|
c += 1
|
||||||
|
plugin_name = 'dummy%d'%c
|
||||||
|
if plugin_name not in self.loaded_plugins:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
if self._identifier_pat.match(plugin_name) is None:
|
||||||
|
raise InvalidPlugin((
|
||||||
|
'The plugin at %r uses an invalid import name: %r' %
|
||||||
|
(path_to_zip_file, plugin_name)))
|
||||||
|
|
||||||
|
pynames = [x for x in names if x.endswith('.py')]
|
||||||
|
|
||||||
|
candidates = [posixpath.dirname(x) for x in pynames if
|
||||||
|
x.endswith('/__init__.py')]
|
||||||
|
candidates.sort(key=lambda x: x.count('/'))
|
||||||
|
valid_packages = set()
|
||||||
|
|
||||||
|
for candidate in candidates:
|
||||||
|
parts = candidate.split('/')
|
||||||
|
parent = '.'.join(parts[:-1])
|
||||||
|
if parent and parent not in valid_packages:
|
||||||
|
continue
|
||||||
|
valid_packages.add('.'.join(parts))
|
||||||
|
|
||||||
|
names = OrderedDict()
|
||||||
|
|
||||||
|
for candidate in pynames:
|
||||||
|
parts = posixpath.splitext(candidate)[0].split('/')
|
||||||
|
package = '.'.join(parts[:-1])
|
||||||
|
if package and package not in valid_packages:
|
||||||
|
continue
|
||||||
|
name = '.'.join(parts)
|
||||||
|
names[name] = zf.getinfo(candidate)
|
||||||
|
|
||||||
|
# Legacy plugins
|
||||||
|
if '__init__' not in names:
|
||||||
|
for name in tuple(names):
|
||||||
|
if '.' not in name and name.endswith('plugin'):
|
||||||
|
names['__init__'] = names[name]
|
||||||
|
break
|
||||||
|
|
||||||
|
if '__init__' not in names:
|
||||||
|
raise InvalidPlugin(('The plugin in %r is invalid. It does not '
|
||||||
|
'contain a top-level __init__.py file')
|
||||||
|
% path_to_zip_file)
|
||||||
|
|
||||||
|
with self._lock:
|
||||||
|
self.loaded_plugins[plugin_name] = (path_to_zip_file, names)
|
||||||
|
|
||||||
|
return plugin_name
|
||||||
|
|
||||||
|
|
||||||
|
loader = PluginLoader()
|
||||||
|
sys.meta_path.insert(0, loader)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from tempfile import NamedTemporaryFile
|
||||||
|
from calibre.customize.ui import add_plugin
|
||||||
|
from calibre import CurrentDir
|
||||||
|
path = sys.argv[-1]
|
||||||
|
with NamedTemporaryFile(suffix='.zip') as f:
|
||||||
|
with zipfile.ZipFile(f, 'w') as zf:
|
||||||
|
with CurrentDir(path):
|
||||||
|
for x in os.listdir('.'):
|
||||||
|
if x[0] != '.':
|
||||||
|
print('Adding', x)
|
||||||
|
zf.write(x)
|
||||||
|
if os.path.isdir(x):
|
||||||
|
for y in os.listdir(x):
|
||||||
|
zf.write(os.path.join(x, y))
|
||||||
|
add_plugin(f.name)
|
||||||
|
print('Added plugin from', sys.argv[-1])
|
||||||
216
ebook_converter/devices/__init__.py
Normal file
216
ebook_converter/devices/__init__.py
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
'''
|
||||||
|
Device drivers.
|
||||||
|
'''
|
||||||
|
|
||||||
|
import sys, time, pprint
|
||||||
|
from functools import partial
|
||||||
|
from polyglot.builtins import zip, unicode_type
|
||||||
|
|
||||||
|
DAY_MAP = dict(Sun=0, Mon=1, Tue=2, Wed=3, Thu=4, Fri=5, Sat=6)
|
||||||
|
MONTH_MAP = dict(Jan=1, Feb=2, Mar=3, Apr=4, May=5, Jun=6, Jul=7, Aug=8, Sep=9, Oct=10, Nov=11, Dec=12)
|
||||||
|
INVERSE_DAY_MAP = dict(zip(DAY_MAP.values(), DAY_MAP.keys()))
|
||||||
|
INVERSE_MONTH_MAP = dict(zip(MONTH_MAP.values(), MONTH_MAP.keys()))
|
||||||
|
|
||||||
|
|
||||||
|
def strptime(src):
|
||||||
|
src = src.strip()
|
||||||
|
src = src.split()
|
||||||
|
src[0] = unicode_type(DAY_MAP[src[0][:-1]])+','
|
||||||
|
src[2] = unicode_type(MONTH_MAP[src[2]])
|
||||||
|
return time.strptime(' '.join(src), '%w, %d %m %Y %H:%M:%S %Z')
|
||||||
|
|
||||||
|
|
||||||
|
def strftime(epoch, zone=time.gmtime):
|
||||||
|
src = time.strftime("%w, %d %m %Y %H:%M:%S GMT", zone(epoch)).split()
|
||||||
|
src[0] = INVERSE_DAY_MAP[int(src[0][:-1])]+','
|
||||||
|
src[2] = INVERSE_MONTH_MAP[int(src[2])]
|
||||||
|
return ' '.join(src)
|
||||||
|
|
||||||
|
|
||||||
|
def get_connected_device():
|
||||||
|
from calibre.customize.ui import device_plugins
|
||||||
|
from calibre.devices.scanner import DeviceScanner
|
||||||
|
dev = None
|
||||||
|
scanner = DeviceScanner()
|
||||||
|
scanner.scan()
|
||||||
|
connected_devices = []
|
||||||
|
for d in device_plugins():
|
||||||
|
ok, det = scanner.is_device_connected(d)
|
||||||
|
if ok:
|
||||||
|
dev = d
|
||||||
|
dev.reset(log_packets=False, detected_device=det)
|
||||||
|
connected_devices.append((det, dev))
|
||||||
|
|
||||||
|
if dev is None:
|
||||||
|
print('Unable to find a connected ebook reader.', file=sys.stderr)
|
||||||
|
return
|
||||||
|
|
||||||
|
for det, d in connected_devices:
|
||||||
|
try:
|
||||||
|
d.open(det, None)
|
||||||
|
except:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
dev = d
|
||||||
|
break
|
||||||
|
return dev
|
||||||
|
|
||||||
|
|
||||||
|
def debug(ioreg_to_tmp=False, buf=None, plugins=None,
|
||||||
|
disabled_plugins=None):
|
||||||
|
'''
|
||||||
|
If plugins is None, then this method calls startup and shutdown on the
|
||||||
|
device plugins. So if you are using it in a context where startup could
|
||||||
|
already have been called (for example in the main GUI), pass in the list of
|
||||||
|
device plugins as the plugins parameter.
|
||||||
|
'''
|
||||||
|
import textwrap
|
||||||
|
from calibre.customize.ui import device_plugins, disabled_device_plugins
|
||||||
|
from calibre.debug import print_basic_debug_info
|
||||||
|
from calibre.devices.scanner import DeviceScanner
|
||||||
|
from calibre.constants import iswindows, isosx
|
||||||
|
from calibre import prints
|
||||||
|
from polyglot.io import PolyglotBytesIO
|
||||||
|
oldo, olde = sys.stdout, sys.stderr
|
||||||
|
|
||||||
|
if buf is None:
|
||||||
|
buf = PolyglotBytesIO()
|
||||||
|
sys.stdout = sys.stderr = buf
|
||||||
|
out = partial(prints, file=buf)
|
||||||
|
|
||||||
|
devplugins = device_plugins() if plugins is None else plugins
|
||||||
|
devplugins = list(sorted(devplugins, key=lambda x: x.__class__.__name__))
|
||||||
|
if plugins is None:
|
||||||
|
for d in devplugins:
|
||||||
|
try:
|
||||||
|
d.startup()
|
||||||
|
except:
|
||||||
|
out('Startup failed for device plugin: %s'%d)
|
||||||
|
|
||||||
|
if disabled_plugins is None:
|
||||||
|
disabled_plugins = list(disabled_device_plugins())
|
||||||
|
|
||||||
|
try:
|
||||||
|
print_basic_debug_info(out=buf)
|
||||||
|
s = DeviceScanner()
|
||||||
|
s.scan()
|
||||||
|
devices = (s.devices)
|
||||||
|
if not iswindows:
|
||||||
|
devices = [list(x) for x in devices]
|
||||||
|
for d in devices:
|
||||||
|
for i in range(3):
|
||||||
|
d[i] = hex(d[i])
|
||||||
|
out('USB devices on system:')
|
||||||
|
out(pprint.pformat(devices))
|
||||||
|
|
||||||
|
ioreg = None
|
||||||
|
if isosx:
|
||||||
|
from calibre.devices.usbms.device import Device
|
||||||
|
mount = '\n'.join(repr(x) for x in Device.osx_run_mount().splitlines())
|
||||||
|
drives = pprint.pformat(Device.osx_get_usb_drives())
|
||||||
|
ioreg = 'Output from mount:\n'+mount+'\n\n'
|
||||||
|
ioreg += 'Output from osx_get_usb_drives:\n'+drives+'\n\n'
|
||||||
|
ioreg += Device.run_ioreg()
|
||||||
|
connected_devices = []
|
||||||
|
if disabled_plugins:
|
||||||
|
out('\nDisabled plugins:', textwrap.fill(' '.join([x.__class__.__name__ for x in
|
||||||
|
disabled_plugins])))
|
||||||
|
out(' ')
|
||||||
|
else:
|
||||||
|
out('\nNo disabled plugins')
|
||||||
|
found_dev = False
|
||||||
|
for dev in devplugins:
|
||||||
|
if not dev.MANAGES_DEVICE_PRESENCE:
|
||||||
|
continue
|
||||||
|
out('Looking for devices of type:', dev.__class__.__name__)
|
||||||
|
if dev.debug_managed_device_detection(s.devices, buf):
|
||||||
|
found_dev = True
|
||||||
|
break
|
||||||
|
out(' ')
|
||||||
|
|
||||||
|
if not found_dev:
|
||||||
|
out('Looking for devices...')
|
||||||
|
for dev in devplugins:
|
||||||
|
if dev.MANAGES_DEVICE_PRESENCE:
|
||||||
|
continue
|
||||||
|
connected, det = s.is_device_connected(dev, debug=True)
|
||||||
|
if connected:
|
||||||
|
out('\t\tDetected possible device', dev.__class__.__name__)
|
||||||
|
connected_devices.append((dev, det))
|
||||||
|
|
||||||
|
out(' ')
|
||||||
|
errors = {}
|
||||||
|
success = False
|
||||||
|
out('Devices possibly connected:', end=' ')
|
||||||
|
for dev, det in connected_devices:
|
||||||
|
out(dev.name, end=', ')
|
||||||
|
if not connected_devices:
|
||||||
|
out('None', end='')
|
||||||
|
out(' ')
|
||||||
|
for dev, det in connected_devices:
|
||||||
|
out('Trying to open', dev.name, '...', end=' ')
|
||||||
|
dev.do_device_debug = True
|
||||||
|
try:
|
||||||
|
dev.reset(detected_device=det)
|
||||||
|
dev.open(det, None)
|
||||||
|
out('OK')
|
||||||
|
except:
|
||||||
|
import traceback
|
||||||
|
errors[dev] = traceback.format_exc()
|
||||||
|
out('failed')
|
||||||
|
continue
|
||||||
|
dev.do_device_debug = False
|
||||||
|
success = True
|
||||||
|
if hasattr(dev, '_main_prefix'):
|
||||||
|
out('Main memory:', repr(dev._main_prefix))
|
||||||
|
out('Total space:', dev.total_space())
|
||||||
|
break
|
||||||
|
if not success and errors:
|
||||||
|
out('Opening of the following devices failed')
|
||||||
|
for dev,msg in errors.items():
|
||||||
|
out(dev)
|
||||||
|
out(msg)
|
||||||
|
out(' ')
|
||||||
|
|
||||||
|
if ioreg is not None:
|
||||||
|
ioreg = 'IOREG Output\n'+ioreg
|
||||||
|
out(' ')
|
||||||
|
if ioreg_to_tmp:
|
||||||
|
lopen('/tmp/ioreg.txt', 'wb').write(ioreg)
|
||||||
|
out('Dont forget to send the contents of /tmp/ioreg.txt')
|
||||||
|
out('You can open it with the command: open /tmp/ioreg.txt')
|
||||||
|
else:
|
||||||
|
out(ioreg)
|
||||||
|
|
||||||
|
if hasattr(buf, 'getvalue'):
|
||||||
|
return buf.getvalue().decode('utf-8', 'replace')
|
||||||
|
finally:
|
||||||
|
sys.stdout = oldo
|
||||||
|
sys.stderr = olde
|
||||||
|
if plugins is None:
|
||||||
|
for d in devplugins:
|
||||||
|
try:
|
||||||
|
d.shutdown()
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def device_info(ioreg_to_tmp=False, buf=None):
|
||||||
|
from calibre.devices.scanner import DeviceScanner
|
||||||
|
|
||||||
|
res = {}
|
||||||
|
res['device_set'] = device_set = set()
|
||||||
|
res['device_details'] = device_details = {}
|
||||||
|
|
||||||
|
s = DeviceScanner()
|
||||||
|
s.scan()
|
||||||
|
devices = s.devices
|
||||||
|
devices = [tuple(x) for x in devices]
|
||||||
|
for dev in devices:
|
||||||
|
device_set.add(dev)
|
||||||
|
device_details[dev] = dev[0:3]
|
||||||
|
return res
|
||||||
787
ebook_converter/devices/interface.py
Normal file
787
ebook_converter/devices/interface.py
Normal file
@@ -0,0 +1,787 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
import os
|
||||||
|
from collections import namedtuple
|
||||||
|
|
||||||
|
from calibre import prints
|
||||||
|
from calibre.constants import iswindows
|
||||||
|
from calibre.customize import Plugin
|
||||||
|
|
||||||
|
|
||||||
|
class DevicePlugin(Plugin):
|
||||||
|
"""
|
||||||
|
Defines the interface that should be implemented by backends that
|
||||||
|
communicate with an e-book reader.
|
||||||
|
"""
|
||||||
|
type = _('Device interface')
|
||||||
|
|
||||||
|
#: Ordered list of supported formats
|
||||||
|
FORMATS = ["lrf", "rtf", "pdf", "txt"]
|
||||||
|
# If True, the config dialog will not show the formats box
|
||||||
|
HIDE_FORMATS_CONFIG_BOX = False
|
||||||
|
|
||||||
|
#: VENDOR_ID can be either an integer, a list of integers or a dictionary
|
||||||
|
#: If it is a dictionary, it must be a dictionary of dictionaries,
|
||||||
|
#: of the form::
|
||||||
|
#:
|
||||||
|
#: {
|
||||||
|
#: integer_vendor_id : { product_id : [list of BCDs], ... },
|
||||||
|
#: ...
|
||||||
|
#: }
|
||||||
|
#:
|
||||||
|
VENDOR_ID = 0x0000
|
||||||
|
|
||||||
|
#: An integer or a list of integers
|
||||||
|
PRODUCT_ID = 0x0000
|
||||||
|
#: BCD can be either None to not distinguish between devices based on BCD, or
|
||||||
|
#: it can be a list of the BCD numbers of all devices supported by this driver.
|
||||||
|
BCD = None
|
||||||
|
|
||||||
|
#: Height for thumbnails on the device
|
||||||
|
THUMBNAIL_HEIGHT = 68
|
||||||
|
|
||||||
|
#: Compression quality for thumbnails. Set this closer to 100 to have better
|
||||||
|
#: quality thumbnails with fewer compression artifacts. Of course, the
|
||||||
|
#: thumbnails get larger as well.
|
||||||
|
THUMBNAIL_COMPRESSION_QUALITY = 75
|
||||||
|
|
||||||
|
#: Set this to True if the device supports updating cover thumbnails during
|
||||||
|
#: sync_booklists. Setting it to true will ask device.py to refresh the
|
||||||
|
#: cover thumbnails during book matching
|
||||||
|
WANTS_UPDATED_THUMBNAILS = False
|
||||||
|
|
||||||
|
#: Whether the metadata on books can be set via the GUI.
|
||||||
|
CAN_SET_METADATA = ['title', 'authors', 'collections']
|
||||||
|
|
||||||
|
#: Whether the device can handle device_db metadata plugboards
|
||||||
|
CAN_DO_DEVICE_DB_PLUGBOARD = False
|
||||||
|
|
||||||
|
# Set this to None if the books on the device are files that the GUI can
|
||||||
|
# access in order to add the books from the device to the library
|
||||||
|
BACKLOADING_ERROR_MESSAGE = _('Cannot get files from this device')
|
||||||
|
|
||||||
|
#: Path separator for paths to books on device
|
||||||
|
path_sep = os.sep
|
||||||
|
|
||||||
|
#: Icon for this device
|
||||||
|
icon = I('reader.png')
|
||||||
|
|
||||||
|
# Encapsulates an annotation fetched from the device
|
||||||
|
UserAnnotation = namedtuple('Annotation','type, value')
|
||||||
|
|
||||||
|
#: GUI displays this as a message if not None. Useful if opening can take a
|
||||||
|
#: long time
|
||||||
|
OPEN_FEEDBACK_MESSAGE = None
|
||||||
|
|
||||||
|
#: Set of extensions that are "virtual books" on the device
|
||||||
|
#: and therefore cannot be viewed/saved/added to library.
|
||||||
|
#: For example: ``frozenset(['kobo'])``
|
||||||
|
VIRTUAL_BOOK_EXTENSIONS = frozenset()
|
||||||
|
|
||||||
|
#: Message to display to user for virtual book extensions.
|
||||||
|
VIRTUAL_BOOK_EXTENSION_MESSAGE = None
|
||||||
|
|
||||||
|
#: Whether to nuke comments in the copy of the book sent to the device. If
|
||||||
|
#: not None this should be short string that the comments will be replaced
|
||||||
|
#: by.
|
||||||
|
NUKE_COMMENTS = None
|
||||||
|
|
||||||
|
#: If True indicates that this driver completely manages device detection,
|
||||||
|
#: ejecting and so forth. If you set this to True, you *must* implement the
|
||||||
|
#: detect_managed_devices and debug_managed_device_detection methods.
|
||||||
|
#: A driver with this set to true is responsible for detection of devices,
|
||||||
|
#: managing a blacklist of devices, a list of ejected devices and so forth.
|
||||||
|
#: calibre will periodically call the detect_managed_devices() method and
|
||||||
|
#: if it returns a detected device, calibre will call open(). open() will
|
||||||
|
#: be called every time a device is returned even if previous calls to open()
|
||||||
|
#: failed, therefore the driver must maintain its own blacklist of failed
|
||||||
|
#: devices. Similarly, when ejecting, calibre will call eject() and then
|
||||||
|
#: assuming the next call to detect_managed_devices() returns None, it will
|
||||||
|
#: call post_yank_cleanup().
|
||||||
|
MANAGES_DEVICE_PRESENCE = False
|
||||||
|
|
||||||
|
#: If set the True, calibre will call the :meth:`get_driveinfo()` method
|
||||||
|
#: after the books lists have been loaded to get the driveinfo.
|
||||||
|
SLOW_DRIVEINFO = False
|
||||||
|
|
||||||
|
#: If set to True, calibre will ask the user if they want to manage the
|
||||||
|
#: device with calibre, the first time it is detected. If you set this to
|
||||||
|
#: True you must implement :meth:`get_device_uid()` and
|
||||||
|
#: :meth:`ignore_connected_device()` and
|
||||||
|
#: :meth:`get_user_blacklisted_devices` and
|
||||||
|
#: :meth:`set_user_blacklisted_devices`
|
||||||
|
ASK_TO_ALLOW_CONNECT = False
|
||||||
|
|
||||||
|
#: Set this to a dictionary of the form {'title':title, 'msg':msg, 'det_msg':detailed_msg} to have calibre popup
|
||||||
|
#: a message to the user after some callbacks are run (currently only upload_books).
|
||||||
|
#: Be careful to not spam the user with too many messages. This variable is checked after *every* callback,
|
||||||
|
#: so only set it when you really need to.
|
||||||
|
user_feedback_after_callback = None
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_gui_name(cls):
|
||||||
|
if hasattr(cls, 'gui_name'):
|
||||||
|
return cls.gui_name
|
||||||
|
if hasattr(cls, '__name__'):
|
||||||
|
return cls.__name__
|
||||||
|
return cls.name
|
||||||
|
|
||||||
|
# Device detection {{{
|
||||||
|
def test_bcd(self, bcdDevice, bcd):
|
||||||
|
if bcd is None or len(bcd) == 0:
|
||||||
|
return True
|
||||||
|
for c in bcd:
|
||||||
|
if c == bcdDevice:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def is_usb_connected(self, devices_on_system, debug=False, only_presence=False):
|
||||||
|
'''
|
||||||
|
Return True, device_info if a device handled by this plugin is currently connected.
|
||||||
|
|
||||||
|
:param devices_on_system: List of devices currently connected
|
||||||
|
|
||||||
|
'''
|
||||||
|
vendors_on_system = {x[0] for x in devices_on_system}
|
||||||
|
vendors = set(self.VENDOR_ID) if hasattr(self.VENDOR_ID, '__len__') else {self.VENDOR_ID}
|
||||||
|
if hasattr(self.VENDOR_ID, 'keys'):
|
||||||
|
products = []
|
||||||
|
for ven in self.VENDOR_ID:
|
||||||
|
products.extend(self.VENDOR_ID[ven].keys())
|
||||||
|
else:
|
||||||
|
products = self.PRODUCT_ID if hasattr(self.PRODUCT_ID, '__len__') else [self.PRODUCT_ID]
|
||||||
|
|
||||||
|
ch = self.can_handle_windows if iswindows else self.can_handle
|
||||||
|
for vid in vendors_on_system.intersection(vendors):
|
||||||
|
for dev in devices_on_system:
|
||||||
|
cvid, pid, bcd = dev[:3]
|
||||||
|
if cvid == vid:
|
||||||
|
if pid in products:
|
||||||
|
if hasattr(self.VENDOR_ID, 'keys'):
|
||||||
|
try:
|
||||||
|
cbcd = self.VENDOR_ID[vid][pid]
|
||||||
|
except KeyError:
|
||||||
|
# Vendor vid does not have product pid, pid
|
||||||
|
# exists for some other vendor in this
|
||||||
|
# device
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
cbcd = self.BCD
|
||||||
|
if self.test_bcd(bcd, cbcd):
|
||||||
|
if debug:
|
||||||
|
prints(dev)
|
||||||
|
if ch(dev, debug=debug):
|
||||||
|
return True, dev
|
||||||
|
return False, None
|
||||||
|
|
||||||
|
def detect_managed_devices(self, devices_on_system, force_refresh=False):
|
||||||
|
'''
|
||||||
|
Called only if MANAGES_DEVICE_PRESENCE is True.
|
||||||
|
|
||||||
|
Scan for devices that this driver can handle. Should return a device
|
||||||
|
object if a device is found. This object will be passed to the open()
|
||||||
|
method as the connected_device. If no device is found, return None. The
|
||||||
|
returned object can be anything, calibre does not use it, it is only
|
||||||
|
passed to open().
|
||||||
|
|
||||||
|
This method is called periodically by the GUI, so make sure it is not
|
||||||
|
too resource intensive. Use a cache to avoid repeatedly scanning the
|
||||||
|
system.
|
||||||
|
|
||||||
|
:param devices_on_system: Set of USB devices found on the system.
|
||||||
|
|
||||||
|
:param force_refresh: If True and the driver uses a cache to prevent
|
||||||
|
repeated scanning, the cache must be flushed.
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def debug_managed_device_detection(self, devices_on_system, output):
|
||||||
|
'''
|
||||||
|
Called only if MANAGES_DEVICE_PRESENCE is True.
|
||||||
|
|
||||||
|
Should write information about the devices detected on the system to
|
||||||
|
output, which is a file like object.
|
||||||
|
|
||||||
|
Should return True if a device was detected and successfully opened,
|
||||||
|
otherwise False.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def reset(self, key='-1', log_packets=False, report_progress=None,
|
||||||
|
detected_device=None):
|
||||||
|
"""
|
||||||
|
:param key: The key to unlock the device
|
||||||
|
:param log_packets: If true the packet stream to/from the device is logged
|
||||||
|
:param report_progress: Function that is called with a % progress
|
||||||
|
(number between 0 and 100) for various tasks
|
||||||
|
If it is called with -1 that means that the
|
||||||
|
task does not have any progress information
|
||||||
|
:param detected_device: Device information from the device scanner
|
||||||
|
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def can_handle_windows(self, usbdevice, debug=False):
|
||||||
|
'''
|
||||||
|
Optional method to perform further checks on a device to see if this driver
|
||||||
|
is capable of handling it. If it is not it should return False. This method
|
||||||
|
is only called after the vendor, product ids and the bcd have matched, so
|
||||||
|
it can do some relatively time intensive checks. The default implementation
|
||||||
|
returns True. This method is called only on Windows. See also
|
||||||
|
:meth:`can_handle`.
|
||||||
|
|
||||||
|
Note that for devices based on USBMS this method by default delegates
|
||||||
|
to :meth:`can_handle`. So you only need to override :meth:`can_handle`
|
||||||
|
in your subclass of USBMS.
|
||||||
|
|
||||||
|
:param usbdevice: A usbdevice as returned by :func:`calibre.devices.winusb.scan_usb_devices`
|
||||||
|
'''
|
||||||
|
return True
|
||||||
|
|
||||||
|
def can_handle(self, device_info, debug=False):
|
||||||
|
'''
|
||||||
|
Unix version of :meth:`can_handle_windows`.
|
||||||
|
|
||||||
|
:param device_info: Is a tuple of (vid, pid, bcd, manufacturer, product,
|
||||||
|
serial number)
|
||||||
|
|
||||||
|
'''
|
||||||
|
|
||||||
|
return True
|
||||||
|
can_handle.is_base_class_implementation = True
|
||||||
|
|
||||||
|
def open(self, connected_device, library_uuid):
|
||||||
|
'''
|
||||||
|
Perform any device specific initialization. Called after the device is
|
||||||
|
detected but before any other functions that communicate with the device.
|
||||||
|
For example: For devices that present themselves as USB Mass storage
|
||||||
|
devices, this method would be responsible for mounting the device or
|
||||||
|
if the device has been automounted, for finding out where it has been
|
||||||
|
mounted. The method :meth:`calibre.devices.usbms.device.Device.open` has
|
||||||
|
an implementation of
|
||||||
|
this function that should serve as a good example for USB Mass storage
|
||||||
|
devices.
|
||||||
|
|
||||||
|
This method can raise an OpenFeedback exception to display a message to
|
||||||
|
the user.
|
||||||
|
|
||||||
|
:param connected_device: The device that we are trying to open. It is
|
||||||
|
a tuple of (vendor id, product id, bcd, manufacturer name, product
|
||||||
|
name, device serial number). However, some devices have no serial
|
||||||
|
number and on windows only the first three fields are present, the
|
||||||
|
rest are None.
|
||||||
|
|
||||||
|
:param library_uuid: The UUID of the current calibre library. Can be
|
||||||
|
None if there is no library (for example when used from the command
|
||||||
|
line).
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def eject(self):
|
||||||
|
'''
|
||||||
|
Un-mount / eject the device from the OS. This does not check if there
|
||||||
|
are pending GUI jobs that need to communicate with the device.
|
||||||
|
|
||||||
|
NOTE: That this method may not be called on the same thread as the rest
|
||||||
|
of the device methods.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def post_yank_cleanup(self):
|
||||||
|
'''
|
||||||
|
Called if the user yanks the device without ejecting it first.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def set_progress_reporter(self, report_progress):
|
||||||
|
'''
|
||||||
|
Set a function to report progress information.
|
||||||
|
|
||||||
|
:param report_progress: Function that is called with a % progress
|
||||||
|
(number between 0 and 100) for various tasks
|
||||||
|
If it is called with -1 that means that the
|
||||||
|
task does not have any progress information
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def get_device_information(self, end_session=True):
|
||||||
|
"""
|
||||||
|
Ask device for device information. See L{DeviceInfoQuery}.
|
||||||
|
|
||||||
|
:return: (device name, device version, software version on device, mime type)
|
||||||
|
The tuple can optionally have a fifth element, which is a
|
||||||
|
drive information dictionary. See usbms.driver for an example.
|
||||||
|
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def get_driveinfo(self):
|
||||||
|
'''
|
||||||
|
Return the driveinfo dictionary. Usually called from
|
||||||
|
get_device_information(), but if loading the driveinfo is slow for this
|
||||||
|
driver, then it should set SLOW_DRIVEINFO. In this case, this method
|
||||||
|
will be called by calibre after the book lists have been loaded. Note
|
||||||
|
that it is not called on the device thread, so the driver should cache
|
||||||
|
the drive info in the books() method and this function should return
|
||||||
|
the cached data.
|
||||||
|
'''
|
||||||
|
return {}
|
||||||
|
|
||||||
|
def card_prefix(self, end_session=True):
|
||||||
|
'''
|
||||||
|
Return a 2 element list of the prefix to paths on the cards.
|
||||||
|
If no card is present None is set for the card's prefix.
|
||||||
|
E.G.
|
||||||
|
('/place', '/place2')
|
||||||
|
(None, 'place2')
|
||||||
|
('place', None)
|
||||||
|
(None, None)
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def total_space(self, end_session=True):
|
||||||
|
"""
|
||||||
|
Get total space available on the mountpoints:
|
||||||
|
1. Main memory
|
||||||
|
2. Memory Card A
|
||||||
|
3. Memory Card B
|
||||||
|
|
||||||
|
:return: A 3 element list with total space in bytes of (1, 2, 3). If a
|
||||||
|
particular device doesn't have any of these locations it should return 0.
|
||||||
|
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def free_space(self, end_session=True):
|
||||||
|
"""
|
||||||
|
Get free space available on the mountpoints:
|
||||||
|
1. Main memory
|
||||||
|
2. Card A
|
||||||
|
3. Card B
|
||||||
|
|
||||||
|
:return: A 3 element list with free space in bytes of (1, 2, 3). If a
|
||||||
|
particular device doesn't have any of these locations it should return -1.
|
||||||
|
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def books(self, oncard=None, end_session=True):
|
||||||
|
"""
|
||||||
|
Return a list of e-books on the device.
|
||||||
|
|
||||||
|
:param oncard: If 'carda' or 'cardb' return a list of e-books on the
|
||||||
|
specific storage card, otherwise return list of e-books
|
||||||
|
in main memory of device. If a card is specified and no
|
||||||
|
books are on the card return empty list.
|
||||||
|
|
||||||
|
:return: A BookList.
|
||||||
|
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def upload_books(self, files, names, on_card=None, end_session=True,
|
||||||
|
metadata=None):
|
||||||
|
'''
|
||||||
|
Upload a list of books to the device. If a file already
|
||||||
|
exists on the device, it should be replaced.
|
||||||
|
This method should raise a :class:`FreeSpaceError` if there is not enough
|
||||||
|
free space on the device. The text of the FreeSpaceError must contain the
|
||||||
|
word "card" if ``on_card`` is not None otherwise it must contain the word "memory".
|
||||||
|
|
||||||
|
:param files: A list of paths
|
||||||
|
:param names: A list of file names that the books should have
|
||||||
|
once uploaded to the device. len(names) == len(files)
|
||||||
|
:param metadata: If not None, it is a list of :class:`Metadata` objects.
|
||||||
|
The idea is to use the metadata to determine where on the device to
|
||||||
|
put the book. len(metadata) == len(files). Apart from the regular
|
||||||
|
cover (path to cover), there may also be a thumbnail attribute, which should
|
||||||
|
be used in preference. The thumbnail attribute is of the form
|
||||||
|
(width, height, cover_data as jpeg).
|
||||||
|
|
||||||
|
:return: A list of 3-element tuples. The list is meant to be passed
|
||||||
|
to :meth:`add_books_to_metadata`.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def add_books_to_metadata(cls, locations, metadata, booklists):
|
||||||
|
'''
|
||||||
|
Add locations to the booklists. This function must not communicate with
|
||||||
|
the device.
|
||||||
|
|
||||||
|
:param locations: Result of a call to L{upload_books}
|
||||||
|
:param metadata: List of :class:`Metadata` objects, same as for
|
||||||
|
:meth:`upload_books`.
|
||||||
|
:param booklists: A tuple containing the result of calls to
|
||||||
|
(:meth:`books(oncard=None)`,
|
||||||
|
:meth:`books(oncard='carda')`,
|
||||||
|
:meth`books(oncard='cardb')`).
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def delete_books(self, paths, end_session=True):
|
||||||
|
'''
|
||||||
|
Delete books at paths on device.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def remove_books_from_metadata(cls, paths, booklists):
|
||||||
|
'''
|
||||||
|
Remove books from the metadata list. This function must not communicate
|
||||||
|
with the device.
|
||||||
|
|
||||||
|
:param paths: paths to books on the device.
|
||||||
|
:param booklists: A tuple containing the result of calls to
|
||||||
|
(:meth:`books(oncard=None)`,
|
||||||
|
:meth:`books(oncard='carda')`,
|
||||||
|
:meth`books(oncard='cardb')`).
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def sync_booklists(self, booklists, end_session=True):
|
||||||
|
'''
|
||||||
|
Update metadata on device.
|
||||||
|
|
||||||
|
:param booklists: A tuple containing the result of calls to
|
||||||
|
(:meth:`books(oncard=None)`,
|
||||||
|
:meth:`books(oncard='carda')`,
|
||||||
|
:meth`books(oncard='cardb')`).
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def get_file(self, path, outfile, end_session=True):
|
||||||
|
'''
|
||||||
|
Read the file at ``path`` on the device and write it to outfile.
|
||||||
|
|
||||||
|
:param outfile: file object like ``sys.stdout`` or the result of an
|
||||||
|
:func:`open` call.
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def config_widget(cls):
|
||||||
|
'''
|
||||||
|
Should return a QWidget. The QWidget contains the settings for the
|
||||||
|
device interface
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def save_settings(cls, settings_widget):
|
||||||
|
'''
|
||||||
|
Should save settings to disk. Takes the widget created in
|
||||||
|
:meth:`config_widget` and saves all settings to disk.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def settings(cls):
|
||||||
|
'''
|
||||||
|
Should return an opts object. The opts object should have at least one
|
||||||
|
attribute `format_map` which is an ordered list of formats for the
|
||||||
|
device.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def set_plugboards(self, plugboards, pb_func):
|
||||||
|
'''
|
||||||
|
provide the driver the current set of plugboards and a function to
|
||||||
|
select a specific plugboard. This method is called immediately before
|
||||||
|
add_books and sync_booklists.
|
||||||
|
|
||||||
|
pb_func is a callable with the following signature::
|
||||||
|
def pb_func(device_name, format, plugboards)
|
||||||
|
|
||||||
|
You give it the current device name (either the class name or
|
||||||
|
DEVICE_PLUGBOARD_NAME), the format you are interested in (a 'real'
|
||||||
|
format or 'device_db'), and the plugboards (you were given those by
|
||||||
|
set_plugboards, the same place you got this method).
|
||||||
|
|
||||||
|
:return: None or a single plugboard instance.
|
||||||
|
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def set_driveinfo_name(self, location_code, name):
|
||||||
|
'''
|
||||||
|
Set the device name in the driveinfo file to 'name'. This setting will
|
||||||
|
persist until the file is re-created or the name is changed again.
|
||||||
|
|
||||||
|
Non-disk devices should implement this method based on the location
|
||||||
|
codes returned by the get_device_information() method.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def prepare_addable_books(self, paths):
|
||||||
|
'''
|
||||||
|
Given a list of paths, returns another list of paths. These paths
|
||||||
|
point to addable versions of the books.
|
||||||
|
|
||||||
|
If there is an error preparing a book, then instead of a path, the
|
||||||
|
position in the returned list for that book should be a three tuple:
|
||||||
|
(original_path, the exception instance, traceback)
|
||||||
|
'''
|
||||||
|
return paths
|
||||||
|
|
||||||
|
def startup(self):
|
||||||
|
'''
|
||||||
|
Called when calibre is starting the device. Do any initialization
|
||||||
|
required. Note that multiple instances of the class can be instantiated,
|
||||||
|
and thus __init__ can be called multiple times, but only one instance
|
||||||
|
will have this method called. This method is called on the device
|
||||||
|
thread, not the GUI thread.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def shutdown(self):
|
||||||
|
'''
|
||||||
|
Called when calibre is shutting down, either for good or in preparation
|
||||||
|
to restart. Do any cleanup required. This method is called on the
|
||||||
|
device thread, not the GUI thread.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def get_device_uid(self):
|
||||||
|
'''
|
||||||
|
Must return a unique id for the currently connected device (this is
|
||||||
|
called immediately after a successful call to open()). You must
|
||||||
|
implement this method if you set ASK_TO_ALLOW_CONNECT = True
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def ignore_connected_device(self, uid):
|
||||||
|
'''
|
||||||
|
Should ignore the device identified by uid (the result of a call to
|
||||||
|
get_device_uid()) in the future. You must implement this method if you
|
||||||
|
set ASK_TO_ALLOW_CONNECT = True. Note that this function is called
|
||||||
|
immediately after open(), so if open() caches some state, the driver
|
||||||
|
should reset that state.
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def get_user_blacklisted_devices(self):
|
||||||
|
'''
|
||||||
|
Return map of device uid to friendly name for all devices that the user
|
||||||
|
has asked to be ignored.
|
||||||
|
'''
|
||||||
|
return {}
|
||||||
|
|
||||||
|
def set_user_blacklisted_devices(self, devices):
|
||||||
|
'''
|
||||||
|
Set the list of device uids that should be ignored by this driver.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def specialize_global_preferences(self, device_prefs):
|
||||||
|
'''
|
||||||
|
Implement this method if your device wants to override a particular
|
||||||
|
preference. You must ensure that all call sites that want a preference
|
||||||
|
that can be overridden use device_prefs['something'] instead
|
||||||
|
of prefs['something']. Your
|
||||||
|
method should call device_prefs.set_overrides(pref=val, pref=val, ...).
|
||||||
|
Currently used for:
|
||||||
|
metadata management (prefs['manage_device_metadata'])
|
||||||
|
'''
|
||||||
|
device_prefs.set_overrides()
|
||||||
|
|
||||||
|
def set_library_info(self, library_name, library_uuid, field_metadata):
|
||||||
|
'''
|
||||||
|
Implement this method if you want information about the current calibre
|
||||||
|
library. This method is called at startup and when the calibre library
|
||||||
|
changes while connected.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Dynamic control interface.
|
||||||
|
# The following methods are probably called on the GUI thread. Any driver
|
||||||
|
# that implements these methods must take pains to be thread safe, because
|
||||||
|
# the device_manager might be using the driver at the same time that one of
|
||||||
|
# these methods is called.
|
||||||
|
|
||||||
|
def is_dynamically_controllable(self):
|
||||||
|
'''
|
||||||
|
Called by the device manager when starting plugins. If this method returns
|
||||||
|
a string, then a) it supports the device manager's dynamic control
|
||||||
|
interface, and b) that name is to be used when talking to the plugin.
|
||||||
|
|
||||||
|
This method can be called on the GUI thread. A driver that implements
|
||||||
|
this method must be thread safe.
|
||||||
|
'''
|
||||||
|
return None
|
||||||
|
|
||||||
|
def start_plugin(self):
|
||||||
|
'''
|
||||||
|
This method is called to start the plugin. The plugin should begin
|
||||||
|
to accept device connections however it does that. If the plugin is
|
||||||
|
already accepting connections, then do nothing.
|
||||||
|
|
||||||
|
This method can be called on the GUI thread. A driver that implements
|
||||||
|
this method must be thread safe.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def stop_plugin(self):
|
||||||
|
'''
|
||||||
|
This method is called to stop the plugin. The plugin should no longer
|
||||||
|
accept connections, and should cleanup behind itself. It is likely that
|
||||||
|
this method should call shutdown. If the plugin is already not accepting
|
||||||
|
connections, then do nothing.
|
||||||
|
|
||||||
|
This method can be called on the GUI thread. A driver that implements
|
||||||
|
this method must be thread safe.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def get_option(self, opt_string, default=None):
|
||||||
|
'''
|
||||||
|
Return the value of the option indicated by opt_string. This method can
|
||||||
|
be called when the plugin is not started. Return None if the option does
|
||||||
|
not exist.
|
||||||
|
|
||||||
|
This method can be called on the GUI thread. A driver that implements
|
||||||
|
this method must be thread safe.
|
||||||
|
'''
|
||||||
|
return default
|
||||||
|
|
||||||
|
def set_option(self, opt_string, opt_value):
|
||||||
|
'''
|
||||||
|
Set the value of the option indicated by opt_string. This method can
|
||||||
|
be called when the plugin is not started.
|
||||||
|
|
||||||
|
This method can be called on the GUI thread. A driver that implements
|
||||||
|
this method must be thread safe.
|
||||||
|
'''
|
||||||
|
pass
|
||||||
|
|
||||||
|
def is_running(self):
|
||||||
|
'''
|
||||||
|
Return True if the plugin is started, otherwise false
|
||||||
|
|
||||||
|
This method can be called on the GUI thread. A driver that implements
|
||||||
|
this method must be thread safe.
|
||||||
|
'''
|
||||||
|
return False
|
||||||
|
|
||||||
|
def synchronize_with_db(self, db, book_id, book_metadata, first_call):
|
||||||
|
'''
|
||||||
|
Called during book matching when a book on the device is matched with
|
||||||
|
a book in calibre's db. The method is responsible for syncronizing
|
||||||
|
data from the device to calibre's db (if needed).
|
||||||
|
|
||||||
|
The method must return a two-value tuple. The first value is a set of
|
||||||
|
calibre book ids changed if calibre's database was changed or None if the
|
||||||
|
database was not changed. If the first value is an empty set then the
|
||||||
|
metadata for the book on the device is updated with calibre's metadata
|
||||||
|
and given back to the device, but no GUI refresh of that book is done.
|
||||||
|
This is useful when the calibre data is correct but must be sent to the
|
||||||
|
device.
|
||||||
|
|
||||||
|
The second value is itself a 2-value tuple. The first value in the tuple
|
||||||
|
specifies whether a book format should be sent to the device. The intent
|
||||||
|
is to permit verifying that the book on the device is the same as the
|
||||||
|
book in calibre. This value must be None if no book is to be sent,
|
||||||
|
otherwise return the base file name on the device (a string like
|
||||||
|
foobar.epub). Be sure to include the extension in the name. The device
|
||||||
|
subsystem will construct a send_books job for all books with not- None
|
||||||
|
returned values. Note: other than to later retrieve the extension, the
|
||||||
|
name is ignored in cases where the device uses a template to generate
|
||||||
|
the file name, which most do. The second value in the returned tuple
|
||||||
|
indicated whether the format is future-dated. Return True if it is,
|
||||||
|
otherwise return False. calibre will display a dialog to the user
|
||||||
|
listing all future dated books.
|
||||||
|
|
||||||
|
Extremely important: this method is called on the GUI thread. It must
|
||||||
|
be threadsafe with respect to the device manager's thread.
|
||||||
|
|
||||||
|
book_id: the calibre id for the book in the database.
|
||||||
|
book_metadata: the Metadata object for the book coming from the device.
|
||||||
|
first_call: True if this is the first call during a sync, False otherwise
|
||||||
|
'''
|
||||||
|
return (None, (None, False))
|
||||||
|
|
||||||
|
|
||||||
|
class BookList(list):
|
||||||
|
'''
|
||||||
|
A list of books. Each Book object must have the fields
|
||||||
|
|
||||||
|
#. title
|
||||||
|
#. authors
|
||||||
|
#. size (file size of the book)
|
||||||
|
#. datetime (a UTC time tuple)
|
||||||
|
#. path (path on the device to the book)
|
||||||
|
#. thumbnail (can be None) thumbnail is either a str/bytes object with the
|
||||||
|
image data or it should have an attribute image_path that stores an
|
||||||
|
absolute (platform native) path to the image
|
||||||
|
#. tags (a list of strings, can be empty).
|
||||||
|
|
||||||
|
'''
|
||||||
|
|
||||||
|
__getslice__ = None
|
||||||
|
__setslice__ = None
|
||||||
|
|
||||||
|
def __init__(self, oncard, prefix, settings):
|
||||||
|
pass
|
||||||
|
|
||||||
|
def supports_collections(self):
|
||||||
|
''' Return True if the device supports collections for this book list. '''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def add_book(self, book, replace_metadata):
|
||||||
|
'''
|
||||||
|
Add the book to the booklist. Intent is to maintain any device-internal
|
||||||
|
metadata. Return True if booklists must be sync'ed
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def remove_book(self, book):
|
||||||
|
'''
|
||||||
|
Remove a book from the booklist. Correct any device metadata at the
|
||||||
|
same time
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
def get_collections(self, collection_attributes):
|
||||||
|
'''
|
||||||
|
Return a dictionary of collections created from collection_attributes.
|
||||||
|
Each entry in the dictionary is of the form collection name:[list of
|
||||||
|
books]
|
||||||
|
|
||||||
|
The list of books is sorted by book title, except for collections
|
||||||
|
created from series, in which case series_index is used.
|
||||||
|
|
||||||
|
:param collection_attributes: A list of attributes of the Book object
|
||||||
|
|
||||||
|
'''
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
|
||||||
|
class CurrentlyConnectedDevice(object):
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self._device = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def device(self):
|
||||||
|
return self._device
|
||||||
|
|
||||||
|
|
||||||
|
# A device driver can check if a device is currently connected to calibre using
|
||||||
|
# the following code::
|
||||||
|
# from calibre.device.interface import currently_connected_device
|
||||||
|
# if currently_connected_device.device is None:
|
||||||
|
# # no device connected
|
||||||
|
# The device attribute will be either None or the device driver object
|
||||||
|
# (DevicePlugin instance) for the currently connected device.
|
||||||
|
currently_connected_device = CurrentlyConnectedDevice()
|
||||||
41
ebook_converter/ebooks/BeautifulSoup.py
Normal file
41
ebook_converter/ebooks/BeautifulSoup.py
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
# License: GPLv3 Copyright: 2019, Kovid Goyal <kovid at kovidgoyal.net>
|
||||||
|
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
import bs4
|
||||||
|
from bs4 import ( # noqa
|
||||||
|
CData, Comment, Declaration, NavigableString, ProcessingInstruction,
|
||||||
|
SoupStrainer, Tag, __version__
|
||||||
|
)
|
||||||
|
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
def parse_html(markup):
|
||||||
|
from calibre.ebooks.chardet import strip_encoding_declarations, xml_to_unicode, substitute_entites
|
||||||
|
from calibre.utils.cleantext import clean_xml_chars
|
||||||
|
if isinstance(markup, unicode_type):
|
||||||
|
markup = strip_encoding_declarations(markup)
|
||||||
|
markup = substitute_entites(markup)
|
||||||
|
else:
|
||||||
|
markup = xml_to_unicode(markup, strip_encoding_pats=True, resolve_entities=True)[0]
|
||||||
|
markup = clean_xml_chars(markup)
|
||||||
|
from html5_parser.soup import parse
|
||||||
|
return parse(markup, return_root=False)
|
||||||
|
|
||||||
|
|
||||||
|
def prettify(soup):
|
||||||
|
ans = soup.prettify()
|
||||||
|
if isinstance(ans, bytes):
|
||||||
|
ans = ans.decode('utf-8')
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def BeautifulSoup(markup='', *a, **kw):
|
||||||
|
return parse_html(markup)
|
||||||
|
|
||||||
|
|
||||||
|
def BeautifulStoneSoup(markup='', *a, **kw):
|
||||||
|
return bs4.BeautifulSoup(markup, 'xml')
|
||||||
248
ebook_converter/ebooks/__init__.py
Normal file
248
ebook_converter/ebooks/__init__.py
Normal file
@@ -0,0 +1,248 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
'''
|
||||||
|
Code for the conversion of ebook formats and the reading of metadata
|
||||||
|
from various formats.
|
||||||
|
'''
|
||||||
|
|
||||||
|
import os, re, numbers, sys
|
||||||
|
from calibre import prints
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class ConversionError(Exception):
|
||||||
|
|
||||||
|
def __init__(self, msg, only_msg=False):
|
||||||
|
Exception.__init__(self, msg)
|
||||||
|
self.only_msg = only_msg
|
||||||
|
|
||||||
|
|
||||||
|
class UnknownFormatError(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class DRMError(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class ParserError(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
BOOK_EXTENSIONS = ['lrf', 'rar', 'zip', 'rtf', 'lit', 'txt', 'txtz', 'text', 'htm', 'xhtm',
|
||||||
|
'html', 'htmlz', 'xhtml', 'pdf', 'pdb', 'updb', 'pdr', 'prc', 'mobi', 'azw', 'doc',
|
||||||
|
'epub', 'fb2', 'fbz', 'djv', 'djvu', 'lrx', 'cbr', 'cbz', 'cbc', 'oebzip',
|
||||||
|
'rb', 'imp', 'odt', 'chm', 'tpz', 'azw1', 'pml', 'pmlz', 'mbp', 'tan', 'snb',
|
||||||
|
'xps', 'oxps', 'azw4', 'book', 'zbf', 'pobi', 'docx', 'docm', 'md',
|
||||||
|
'textile', 'markdown', 'ibook', 'ibooks', 'iba', 'azw3', 'ps', 'kepub', 'kfx', 'kpf']
|
||||||
|
|
||||||
|
|
||||||
|
def return_raster_image(path):
|
||||||
|
from calibre.utils.imghdr import what
|
||||||
|
if os.access(path, os.R_OK):
|
||||||
|
with open(path, 'rb') as f:
|
||||||
|
raw = f.read()
|
||||||
|
if what(None, raw) not in (None, 'svg'):
|
||||||
|
return raw
|
||||||
|
|
||||||
|
|
||||||
|
def extract_cover_from_embedded_svg(html, base, log):
|
||||||
|
from calibre.ebooks.oeb.base import XPath, SVG, XLINK
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
root = safe_xml_fromstring(html)
|
||||||
|
|
||||||
|
svg = XPath('//svg:svg')(root)
|
||||||
|
if len(svg) == 1 and len(svg[0]) == 1 and svg[0][0].tag == SVG('image'):
|
||||||
|
image = svg[0][0]
|
||||||
|
href = image.get(XLINK('href'), None)
|
||||||
|
if href:
|
||||||
|
path = os.path.join(base, *href.split('/'))
|
||||||
|
return return_raster_image(path)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_calibre_cover(raw, base, log):
|
||||||
|
from calibre.ebooks.BeautifulSoup import BeautifulSoup
|
||||||
|
soup = BeautifulSoup(raw)
|
||||||
|
matches = soup.find(name=['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'span',
|
||||||
|
'font', 'br'])
|
||||||
|
images = soup.findAll('img', src=True)
|
||||||
|
if matches is None and len(images) == 1 and \
|
||||||
|
images[0].get('alt', '').lower()=='cover':
|
||||||
|
img = images[0]
|
||||||
|
img = os.path.join(base, *img['src'].split('/'))
|
||||||
|
q = return_raster_image(img)
|
||||||
|
if q is not None:
|
||||||
|
return q
|
||||||
|
|
||||||
|
# Look for a simple cover, i.e. a body with no text and only one <img> tag
|
||||||
|
if matches is None:
|
||||||
|
body = soup.find('body')
|
||||||
|
if body is not None:
|
||||||
|
text = u''.join(map(unicode_type, body.findAll(text=True)))
|
||||||
|
if text.strip():
|
||||||
|
# Body has text, abort
|
||||||
|
return
|
||||||
|
images = body.findAll('img', src=True)
|
||||||
|
if len(images) == 1:
|
||||||
|
img = os.path.join(base, *images[0]['src'].split('/'))
|
||||||
|
return return_raster_image(img)
|
||||||
|
|
||||||
|
|
||||||
|
def render_html_svg_workaround(path_to_html, log, width=590, height=750):
|
||||||
|
from calibre.ebooks.oeb.base import SVG_NS
|
||||||
|
with open(path_to_html, 'rb') as f:
|
||||||
|
raw = f.read()
|
||||||
|
raw = xml_to_unicode(raw, strip_encoding_pats=True)[0]
|
||||||
|
data = None
|
||||||
|
if SVG_NS in raw:
|
||||||
|
try:
|
||||||
|
data = extract_cover_from_embedded_svg(raw,
|
||||||
|
os.path.dirname(path_to_html), log)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if data is None:
|
||||||
|
try:
|
||||||
|
data = extract_calibre_cover(raw, os.path.dirname(path_to_html), log)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if data is None:
|
||||||
|
data = render_html_data(path_to_html, width, height)
|
||||||
|
return data
|
||||||
|
|
||||||
|
|
||||||
|
def render_html_data(path_to_html, width, height):
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from calibre.utils.ipc.simple_worker import fork_job, WorkerError
|
||||||
|
result = {}
|
||||||
|
|
||||||
|
def report_error(text=''):
|
||||||
|
prints('Failed to render', path_to_html, 'with errors:', file=sys.stderr)
|
||||||
|
if text:
|
||||||
|
prints(text, file=sys.stderr)
|
||||||
|
if result and result['stdout_stderr']:
|
||||||
|
with open(result['stdout_stderr'], 'rb') as f:
|
||||||
|
prints(f.read(), file=sys.stderr)
|
||||||
|
|
||||||
|
with TemporaryDirectory('-render-html') as tdir:
|
||||||
|
try:
|
||||||
|
result = fork_job('calibre.ebooks.render_html', 'main', args=(path_to_html, tdir, 'jpeg'))
|
||||||
|
except WorkerError as e:
|
||||||
|
report_error(e.orig_tb)
|
||||||
|
else:
|
||||||
|
if result['result']:
|
||||||
|
with open(os.path.join(tdir, 'rendered.jpeg'), 'rb') as f:
|
||||||
|
return f.read()
|
||||||
|
else:
|
||||||
|
report_error()
|
||||||
|
|
||||||
|
|
||||||
|
def check_ebook_format(stream, current_guess):
|
||||||
|
ans = current_guess
|
||||||
|
if current_guess.lower() in ('prc', 'mobi', 'azw', 'azw1', 'azw3'):
|
||||||
|
stream.seek(0)
|
||||||
|
if stream.read(3) == b'TPZ':
|
||||||
|
ans = 'tpz'
|
||||||
|
stream.seek(0)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def normalize(x):
|
||||||
|
if isinstance(x, unicode_type):
|
||||||
|
import unicodedata
|
||||||
|
x = unicodedata.normalize('NFC', x)
|
||||||
|
return x
|
||||||
|
|
||||||
|
|
||||||
|
def calibre_cover(title, author_string, series_string=None,
|
||||||
|
output_format='jpg', title_size=46, author_size=36, logo_path=None):
|
||||||
|
title = normalize(title)
|
||||||
|
author_string = normalize(author_string)
|
||||||
|
series_string = normalize(series_string)
|
||||||
|
from calibre.ebooks.covers import calibre_cover2
|
||||||
|
from calibre.utils.img import image_to_data
|
||||||
|
ans = calibre_cover2(title, author_string or '', series_string or '', logo_path=logo_path, as_qimage=True)
|
||||||
|
return image_to_data(ans, fmt=output_format)
|
||||||
|
|
||||||
|
|
||||||
|
UNIT_RE = re.compile(r'^(-*[0-9]*[.]?[0-9]*)\s*(%|em|ex|en|px|mm|cm|in|pt|pc|rem|q)$')
|
||||||
|
|
||||||
|
|
||||||
|
def unit_convert(value, base, font, dpi, body_font_size=12):
|
||||||
|
' Return value in pts'
|
||||||
|
if isinstance(value, numbers.Number):
|
||||||
|
return value
|
||||||
|
try:
|
||||||
|
return float(value) * 72.0 / dpi
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
result = value
|
||||||
|
m = UNIT_RE.match(value)
|
||||||
|
if m is not None and m.group(1):
|
||||||
|
value = float(m.group(1))
|
||||||
|
unit = m.group(2)
|
||||||
|
if unit == '%':
|
||||||
|
result = (value / 100.0) * base
|
||||||
|
elif unit == 'px':
|
||||||
|
result = value * 72.0 / dpi
|
||||||
|
elif unit == 'in':
|
||||||
|
result = value * 72.0
|
||||||
|
elif unit == 'pt':
|
||||||
|
result = value
|
||||||
|
elif unit == 'em':
|
||||||
|
result = value * font
|
||||||
|
elif unit in ('ex', 'en'):
|
||||||
|
# This is a hack for ex since we have no way to know
|
||||||
|
# the x-height of the font
|
||||||
|
font = font
|
||||||
|
result = value * font * 0.5
|
||||||
|
elif unit == 'pc':
|
||||||
|
result = value * 12.0
|
||||||
|
elif unit == 'mm':
|
||||||
|
result = value * 2.8346456693
|
||||||
|
elif unit == 'cm':
|
||||||
|
result = value * 28.346456693
|
||||||
|
elif unit == 'rem':
|
||||||
|
result = value * body_font_size
|
||||||
|
elif unit == 'q':
|
||||||
|
result = value * 0.708661417325
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def parse_css_length(value):
|
||||||
|
try:
|
||||||
|
m = UNIT_RE.match(value)
|
||||||
|
except TypeError:
|
||||||
|
return None, None
|
||||||
|
if m is not None and m.group(1):
|
||||||
|
value = float(m.group(1))
|
||||||
|
unit = m.group(2)
|
||||||
|
return value, unit.lower()
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
|
||||||
|
def generate_masthead(title, output_path=None, width=600, height=60):
|
||||||
|
from calibre.ebooks.conversion.config import load_defaults
|
||||||
|
recs = load_defaults('mobi_output')
|
||||||
|
masthead_font_family = recs.get('masthead_font', None)
|
||||||
|
from calibre.ebooks.covers import generate_masthead
|
||||||
|
return generate_masthead(title, output_path=output_path, width=width, height=height, font_family=masthead_font_family)
|
||||||
|
|
||||||
|
|
||||||
|
def escape_xpath_attr(value):
|
||||||
|
if '"' in value:
|
||||||
|
if "'" in value:
|
||||||
|
parts = re.split('("+)', value)
|
||||||
|
ans = []
|
||||||
|
for x in parts:
|
||||||
|
if x:
|
||||||
|
q = "'" if '"' in x else '"'
|
||||||
|
ans.append(q + x + q)
|
||||||
|
return 'concat(%s)' % ', '.join(ans)
|
||||||
|
else:
|
||||||
|
return "'%s'" % value
|
||||||
|
return '"%s"' % value
|
||||||
189
ebook_converter/ebooks/chardet.py
Normal file
189
ebook_converter/ebooks/chardet.py
Normal file
@@ -0,0 +1,189 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import re, codecs
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
_encoding_pats = (
|
||||||
|
# XML declaration
|
||||||
|
r'<\?[^<>]+encoding\s*=\s*[\'"](.*?)[\'"][^<>]*>',
|
||||||
|
# HTML 5 charset
|
||||||
|
r'''<meta\s+charset=['"]([-_a-z0-9]+)['"][^<>]*>(?:\s*</meta>){0,1}''',
|
||||||
|
# HTML 4 Pragma directive
|
||||||
|
r'''<meta\s+?[^<>]*?content\s*=\s*['"][^'"]*?charset=([-_a-z0-9]+)[^'"]*?['"][^<>]*>(?:\s*</meta>){0,1}''',
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def compile_pats(binary):
|
||||||
|
for raw in _encoding_pats:
|
||||||
|
if binary:
|
||||||
|
raw = raw.encode('ascii')
|
||||||
|
yield re.compile(raw, flags=re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
class LazyEncodingPats(object):
|
||||||
|
|
||||||
|
def __call__(self, binary=False):
|
||||||
|
attr = 'binary_pats' if binary else 'unicode_pats'
|
||||||
|
pats = getattr(self, attr, None)
|
||||||
|
if pats is None:
|
||||||
|
pats = tuple(compile_pats(binary))
|
||||||
|
setattr(self, attr, pats)
|
||||||
|
for pat in pats:
|
||||||
|
yield pat
|
||||||
|
|
||||||
|
|
||||||
|
lazy_encoding_pats = LazyEncodingPats()
|
||||||
|
ENTITY_PATTERN = re.compile(r'&(\S+?);')
|
||||||
|
|
||||||
|
|
||||||
|
def strip_encoding_declarations(raw, limit=50*1024, preserve_newlines=False):
|
||||||
|
prefix = raw[:limit]
|
||||||
|
suffix = raw[limit:]
|
||||||
|
is_binary = isinstance(raw, bytes)
|
||||||
|
if preserve_newlines:
|
||||||
|
if is_binary:
|
||||||
|
sub = lambda m: b'\n' * m.group().count(b'\n')
|
||||||
|
else:
|
||||||
|
sub = lambda m: '\n' * m.group().count('\n')
|
||||||
|
else:
|
||||||
|
sub = b'' if is_binary else u''
|
||||||
|
for pat in lazy_encoding_pats(is_binary):
|
||||||
|
prefix = pat.sub(sub, prefix)
|
||||||
|
raw = prefix + suffix
|
||||||
|
return raw
|
||||||
|
|
||||||
|
|
||||||
|
def replace_encoding_declarations(raw, enc='utf-8', limit=50*1024):
|
||||||
|
prefix = raw[:limit]
|
||||||
|
suffix = raw[limit:]
|
||||||
|
changed = [False]
|
||||||
|
is_binary = isinstance(raw, bytes)
|
||||||
|
if is_binary:
|
||||||
|
if not isinstance(enc, bytes):
|
||||||
|
enc = enc.encode('ascii')
|
||||||
|
else:
|
||||||
|
if isinstance(enc, bytes):
|
||||||
|
enc = enc.decode('ascii')
|
||||||
|
|
||||||
|
def sub(m):
|
||||||
|
ans = m.group()
|
||||||
|
if m.group(1).lower() != enc.lower():
|
||||||
|
changed[0] = True
|
||||||
|
start, end = m.start(1) - m.start(0), m.end(1) - m.end(0)
|
||||||
|
ans = ans[:start] + enc + ans[end:]
|
||||||
|
return ans
|
||||||
|
|
||||||
|
for pat in lazy_encoding_pats(is_binary):
|
||||||
|
prefix = pat.sub(sub, prefix)
|
||||||
|
raw = prefix + suffix
|
||||||
|
return raw, changed[0]
|
||||||
|
|
||||||
|
|
||||||
|
def find_declared_encoding(raw, limit=50*1024):
|
||||||
|
prefix = raw[:limit]
|
||||||
|
is_binary = isinstance(raw, bytes)
|
||||||
|
for pat in lazy_encoding_pats(is_binary):
|
||||||
|
m = pat.search(prefix)
|
||||||
|
if m is not None:
|
||||||
|
ans = m.group(1)
|
||||||
|
if is_binary:
|
||||||
|
ans = ans.decode('ascii', 'replace')
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def substitute_entites(raw):
|
||||||
|
from calibre import xml_entity_to_unicode
|
||||||
|
return ENTITY_PATTERN.sub(xml_entity_to_unicode, raw)
|
||||||
|
|
||||||
|
|
||||||
|
_CHARSET_ALIASES = {"macintosh" : "mac-roman",
|
||||||
|
"x-sjis" : "shift-jis"}
|
||||||
|
|
||||||
|
|
||||||
|
def detect(*args, **kwargs):
|
||||||
|
from chardet import detect
|
||||||
|
return detect(*args, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def force_encoding(raw, verbose, assume_utf8=False):
|
||||||
|
from calibre.constants import preferred_encoding
|
||||||
|
|
||||||
|
try:
|
||||||
|
chardet = detect(raw[:1024*50])
|
||||||
|
except:
|
||||||
|
chardet = {'encoding':preferred_encoding, 'confidence':0}
|
||||||
|
encoding = chardet['encoding']
|
||||||
|
if chardet['confidence'] < 1 and assume_utf8:
|
||||||
|
encoding = 'utf-8'
|
||||||
|
if chardet['confidence'] < 1 and verbose:
|
||||||
|
print('WARNING: Encoding detection confidence for %s is %d%%'%(
|
||||||
|
chardet['encoding'], chardet['confidence']*100))
|
||||||
|
if not encoding:
|
||||||
|
encoding = preferred_encoding
|
||||||
|
encoding = encoding.lower()
|
||||||
|
encoding = _CHARSET_ALIASES.get(encoding, encoding)
|
||||||
|
if encoding == 'ascii':
|
||||||
|
encoding = 'utf-8'
|
||||||
|
return encoding
|
||||||
|
|
||||||
|
|
||||||
|
def detect_xml_encoding(raw, verbose=False, assume_utf8=False):
|
||||||
|
if not raw or isinstance(raw, unicode_type):
|
||||||
|
return raw, None
|
||||||
|
for x in ('utf8', 'utf-16-le', 'utf-16-be'):
|
||||||
|
bom = getattr(codecs, 'BOM_'+x.upper().replace('-16', '16').replace(
|
||||||
|
'-', '_'))
|
||||||
|
if raw.startswith(bom):
|
||||||
|
return raw[len(bom):], x
|
||||||
|
encoding = None
|
||||||
|
for pat in lazy_encoding_pats(True):
|
||||||
|
match = pat.search(raw)
|
||||||
|
if match:
|
||||||
|
encoding = match.group(1)
|
||||||
|
encoding = encoding.decode('ascii', 'replace')
|
||||||
|
break
|
||||||
|
if encoding is None:
|
||||||
|
encoding = force_encoding(raw, verbose, assume_utf8=assume_utf8)
|
||||||
|
if encoding.lower().strip() == 'macintosh':
|
||||||
|
encoding = 'mac-roman'
|
||||||
|
if encoding.lower().replace('_', '-').strip() in (
|
||||||
|
'gb2312', 'chinese', 'csiso58gb231280', 'euc-cn', 'euccn',
|
||||||
|
'eucgb2312-cn', 'gb2312-1980', 'gb2312-80', 'iso-ir-58'):
|
||||||
|
# Microsoft Word exports to HTML with encoding incorrectly set to
|
||||||
|
# gb2312 instead of gbk. gbk is a superset of gb2312, anyway.
|
||||||
|
encoding = 'gbk'
|
||||||
|
try:
|
||||||
|
codecs.lookup(encoding)
|
||||||
|
except LookupError:
|
||||||
|
encoding = 'utf-8'
|
||||||
|
|
||||||
|
return raw, encoding
|
||||||
|
|
||||||
|
|
||||||
|
def xml_to_unicode(raw, verbose=False, strip_encoding_pats=False,
|
||||||
|
resolve_entities=False, assume_utf8=False):
|
||||||
|
'''
|
||||||
|
Force conversion of byte string to unicode. Tries to look for XML/HTML
|
||||||
|
encoding declaration first, if not found uses the chardet library and
|
||||||
|
prints a warning if detection confidence is < 100%
|
||||||
|
@return: (unicode, encoding used)
|
||||||
|
'''
|
||||||
|
if not raw:
|
||||||
|
return '', None
|
||||||
|
raw, encoding = detect_xml_encoding(raw, verbose=verbose,
|
||||||
|
assume_utf8=assume_utf8)
|
||||||
|
if not isinstance(raw, unicode_type):
|
||||||
|
raw = raw.decode(encoding, 'replace')
|
||||||
|
|
||||||
|
if strip_encoding_pats:
|
||||||
|
raw = strip_encoding_declarations(raw)
|
||||||
|
if resolve_entities:
|
||||||
|
raw = substitute_entites(raw)
|
||||||
|
|
||||||
|
return raw, encoding
|
||||||
6
ebook_converter/ebooks/compression/__init__.py
Normal file
6
ebook_converter/ebooks/compression/__init__.py
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
238
ebook_converter/ebooks/compression/palmdoc.c
Normal file
238
ebook_converter/ebooks/compression/palmdoc.c
Normal file
@@ -0,0 +1,238 @@
|
|||||||
|
/*
|
||||||
|
:mod:`cPalmdoc` -- Palmdoc compression/decompression
|
||||||
|
=====================================================
|
||||||
|
|
||||||
|
.. module:: cPalmdoc
|
||||||
|
:platform: All
|
||||||
|
:synopsis: Compression decompression of Palmdoc implemented in C for speed
|
||||||
|
|
||||||
|
.. moduleauthor:: Kovid Goyal <kovid@kovidgoyal.net> Copyright 2009
|
||||||
|
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define PY_SSIZE_T_CLEAN
|
||||||
|
#include <Python.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
#define BUFFER 6000
|
||||||
|
|
||||||
|
#define MIN(x, y) ( ((x) < (y)) ? (x) : (y) )
|
||||||
|
#define MAX(x, y) ( ((x) > (y)) ? (x) : (y) )
|
||||||
|
|
||||||
|
typedef unsigned short int Byte;
|
||||||
|
typedef struct {
|
||||||
|
Byte *data;
|
||||||
|
Py_ssize_t len;
|
||||||
|
} buffer;
|
||||||
|
|
||||||
|
#ifdef bool
|
||||||
|
#undef bool
|
||||||
|
#endif
|
||||||
|
#define bool int
|
||||||
|
|
||||||
|
#ifdef false
|
||||||
|
#undef false
|
||||||
|
#endif
|
||||||
|
#define false 0
|
||||||
|
|
||||||
|
#ifdef true
|
||||||
|
#undef true
|
||||||
|
#endif
|
||||||
|
#define true 1
|
||||||
|
|
||||||
|
#define CHAR(x) (( (x) > 127 ) ? (x)-256 : (x))
|
||||||
|
|
||||||
|
#if PY_MAJOR_VERSION >= 3
|
||||||
|
#define BUFFER_FMT "y#"
|
||||||
|
#define BYTES_FMT "y#"
|
||||||
|
#else
|
||||||
|
#define BUFFER_FMT "t#"
|
||||||
|
#define BYTES_FMT "s#"
|
||||||
|
#endif
|
||||||
|
|
||||||
|
static PyObject *
|
||||||
|
cpalmdoc_decompress(PyObject *self, PyObject *args) {
|
||||||
|
const char *_input = NULL; Py_ssize_t input_len = 0;
|
||||||
|
Byte *input; char *output; Byte c; PyObject *ans;
|
||||||
|
Py_ssize_t i = 0, o = 0, j = 0, di, n;
|
||||||
|
if (!PyArg_ParseTuple(args, BUFFER_FMT, &_input, &input_len))
|
||||||
|
return NULL;
|
||||||
|
input = (Byte *) PyMem_Malloc(sizeof(Byte)*input_len);
|
||||||
|
if (input == NULL) return PyErr_NoMemory();
|
||||||
|
// Map chars to bytes
|
||||||
|
for (j = 0; j < input_len; j++)
|
||||||
|
input[j] = (_input[j] < 0) ? _input[j]+256 : _input[j];
|
||||||
|
output = (char *)PyMem_Malloc(sizeof(char)*(MAX(BUFFER, 8*input_len)));
|
||||||
|
if (output == NULL) return PyErr_NoMemory();
|
||||||
|
|
||||||
|
while (i < input_len) {
|
||||||
|
c = input[i++];
|
||||||
|
if (c >= 1 && c <= 8) // copy 'c' bytes
|
||||||
|
while (c--) output[o++] = (char)input[i++];
|
||||||
|
|
||||||
|
else if (c <= 0x7F) // 0, 09-7F = self
|
||||||
|
output[o++] = (char)c;
|
||||||
|
|
||||||
|
else if (c >= 0xC0) { // space + ASCII char
|
||||||
|
output[o++] = ' ';
|
||||||
|
output[o++] = c ^ 0x80;
|
||||||
|
}
|
||||||
|
else { // 80-BF repeat sequences
|
||||||
|
c = (c << 8) + input[i++];
|
||||||
|
di = (c & 0x3FFF) >> 3;
|
||||||
|
for ( n = (c & 7) + 3; n--; ++o )
|
||||||
|
output[o] = output[o - di];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
ans = Py_BuildValue(BYTES_FMT, output, o);
|
||||||
|
if (output != NULL) PyMem_Free(output);
|
||||||
|
if (input != NULL) PyMem_Free(input);
|
||||||
|
return ans;
|
||||||
|
}
|
||||||
|
|
||||||
|
static bool
|
||||||
|
cpalmdoc_memcmp( Byte *a, Byte *b, Py_ssize_t len) {
|
||||||
|
Py_ssize_t i;
|
||||||
|
for (i = 0; i < len; i++) if (a[i] != b[i]) return false;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
static Py_ssize_t
|
||||||
|
cpalmdoc_rfind(Byte *data, Py_ssize_t pos, Py_ssize_t chunk_length) {
|
||||||
|
Py_ssize_t i;
|
||||||
|
for (i = pos - chunk_length; i > -1; i--)
|
||||||
|
if (cpalmdoc_memcmp(data+i, data+pos, chunk_length)) return i;
|
||||||
|
return pos;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static Py_ssize_t
|
||||||
|
cpalmdoc_do_compress(buffer *b, char *output) {
|
||||||
|
Py_ssize_t i = 0, j, chunk_len, dist;
|
||||||
|
unsigned int compound;
|
||||||
|
Byte c, n;
|
||||||
|
bool found;
|
||||||
|
char *head;
|
||||||
|
buffer temp;
|
||||||
|
head = output;
|
||||||
|
temp.data = (Byte *)PyMem_Malloc(sizeof(Byte)*8); temp.len = 0;
|
||||||
|
if (temp.data == NULL) return 0;
|
||||||
|
while (i < b->len) {
|
||||||
|
c = b->data[i];
|
||||||
|
//do repeats
|
||||||
|
if ( i > 10 && (b->len - i) > 10) {
|
||||||
|
found = false;
|
||||||
|
for (chunk_len = 10; chunk_len > 2; chunk_len--) {
|
||||||
|
j = cpalmdoc_rfind(b->data, i, chunk_len);
|
||||||
|
dist = i - j;
|
||||||
|
if (j < i && dist <= 2047) {
|
||||||
|
found = true;
|
||||||
|
compound = (unsigned int)((dist << 3) + chunk_len-3);
|
||||||
|
*(output++) = CHAR(0x80 + (compound >> 8 ));
|
||||||
|
*(output++) = CHAR(compound & 0xFF);
|
||||||
|
i += chunk_len;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (found) continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
//write single character
|
||||||
|
i++;
|
||||||
|
if (c == 32 && i < b->len) {
|
||||||
|
n = b->data[i];
|
||||||
|
if ( n >= 0x40 && n <= 0x7F) {
|
||||||
|
*(output++) = CHAR(n^0x80); i++; continue;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (c == 0 || (c > 8 && c < 0x80))
|
||||||
|
*(output++) = CHAR(c);
|
||||||
|
else { // Write binary data
|
||||||
|
j = i;
|
||||||
|
temp.data[0] = c; temp.len = 1;
|
||||||
|
while (j < b->len && temp.len < 8) {
|
||||||
|
c = b->data[j];
|
||||||
|
if (c == 0 || (c > 8 && c < 0x80)) break;
|
||||||
|
temp.data[temp.len++] = c; j++;
|
||||||
|
}
|
||||||
|
i += temp.len - 1;
|
||||||
|
*(output++) = (char)temp.len;
|
||||||
|
for (j=0; j < temp.len; j++) *(output++) = (char)temp.data[j];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
PyMem_Free(temp.data);
|
||||||
|
return output - head;
|
||||||
|
}
|
||||||
|
|
||||||
|
static PyObject *
|
||||||
|
cpalmdoc_compress(PyObject *self, PyObject *args) {
|
||||||
|
const char *_input = NULL; Py_ssize_t input_len = 0;
|
||||||
|
char *output; PyObject *ans;
|
||||||
|
Py_ssize_t j = 0;
|
||||||
|
buffer b;
|
||||||
|
if (!PyArg_ParseTuple(args, BUFFER_FMT, &_input, &input_len))
|
||||||
|
return NULL;
|
||||||
|
b.data = (Byte *)PyMem_Malloc(sizeof(Byte)*input_len);
|
||||||
|
if (b.data == NULL) return PyErr_NoMemory();
|
||||||
|
// Map chars to bytes
|
||||||
|
for (j = 0; j < input_len; j++)
|
||||||
|
b.data[j] = (_input[j] < 0) ? _input[j]+256 : _input[j];
|
||||||
|
b.len = input_len;
|
||||||
|
// Make the output buffer larger than the input as sometimes
|
||||||
|
// compression results in a larger block
|
||||||
|
output = (char *)PyMem_Malloc(sizeof(char) * (int)(1.25*b.len));
|
||||||
|
if (output == NULL) return PyErr_NoMemory();
|
||||||
|
j = cpalmdoc_do_compress(&b, output);
|
||||||
|
if ( j == 0) return PyErr_NoMemory();
|
||||||
|
ans = Py_BuildValue(BYTES_FMT, output, j);
|
||||||
|
PyMem_Free(output);
|
||||||
|
PyMem_Free(b.data);
|
||||||
|
return ans;
|
||||||
|
}
|
||||||
|
|
||||||
|
static char cPalmdoc_doc[] = "Compress and decompress palmdoc strings.";
|
||||||
|
|
||||||
|
static PyMethodDef cPalmdoc_methods[] = {
|
||||||
|
{"decompress", cpalmdoc_decompress, METH_VARARGS,
|
||||||
|
"decompress(bytestring) -> decompressed bytestring\n\n"
|
||||||
|
"Decompress a palmdoc compressed byte string. "
|
||||||
|
},
|
||||||
|
|
||||||
|
{"compress", cpalmdoc_compress, METH_VARARGS,
|
||||||
|
"compress(bytestring) -> compressed bytestring\n\n"
|
||||||
|
"Palmdoc compress a byte string. "
|
||||||
|
},
|
||||||
|
{NULL, NULL, 0, NULL}
|
||||||
|
};
|
||||||
|
|
||||||
|
#if PY_MAJOR_VERSION >= 3
|
||||||
|
#define INITERROR return NULL
|
||||||
|
#define INITMODULE PyModule_Create(&cPalmdoc_module)
|
||||||
|
static struct PyModuleDef cPalmdoc_module = {
|
||||||
|
/* m_base */ PyModuleDef_HEAD_INIT,
|
||||||
|
/* m_name */ "cPalmdoc",
|
||||||
|
/* m_doc */ cPalmdoc_doc,
|
||||||
|
/* m_size */ -1,
|
||||||
|
/* m_methods */ cPalmdoc_methods,
|
||||||
|
/* m_slots */ 0,
|
||||||
|
/* m_traverse */ 0,
|
||||||
|
/* m_clear */ 0,
|
||||||
|
/* m_free */ 0,
|
||||||
|
};
|
||||||
|
CALIBRE_MODINIT_FUNC PyInit_cPalmdoc(void) {
|
||||||
|
#else
|
||||||
|
#define INITERROR return
|
||||||
|
#define INITMODULE Py_InitModule3("cPalmdoc", cPalmdoc_methods, cPalmdoc_doc)
|
||||||
|
CALIBRE_MODINIT_FUNC initcPalmdoc(void) {
|
||||||
|
#endif
|
||||||
|
|
||||||
|
PyObject *m;
|
||||||
|
m = INITMODULE;
|
||||||
|
if (m == NULL) {
|
||||||
|
INITERROR;
|
||||||
|
}
|
||||||
|
|
||||||
|
#if PY_MAJOR_VERSION >= 3
|
||||||
|
return m;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
96
ebook_converter/ebooks/compression/palmdoc.py
Normal file
96
ebook_converter/ebooks/compression/palmdoc.py
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import io
|
||||||
|
from struct import pack
|
||||||
|
|
||||||
|
from calibre.constants import plugins
|
||||||
|
from polyglot.builtins import range
|
||||||
|
cPalmdoc = plugins['cPalmdoc'][0]
|
||||||
|
if not cPalmdoc:
|
||||||
|
raise RuntimeError(('Failed to load required cPalmdoc module: '
|
||||||
|
'%s')%plugins['cPalmdoc'][1])
|
||||||
|
|
||||||
|
|
||||||
|
def decompress_doc(data):
|
||||||
|
return cPalmdoc.decompress(data)
|
||||||
|
|
||||||
|
|
||||||
|
def compress_doc(data):
|
||||||
|
return cPalmdoc.compress(data) if data else b''
|
||||||
|
|
||||||
|
|
||||||
|
def py_compress_doc(data):
|
||||||
|
out = io.BytesIO()
|
||||||
|
i = 0
|
||||||
|
ldata = len(data)
|
||||||
|
while i < ldata:
|
||||||
|
if i > 10 and (ldata - i) > 10:
|
||||||
|
chunk = b''
|
||||||
|
match = -1
|
||||||
|
for j in range(10, 2, -1):
|
||||||
|
chunk = data[i:i+j]
|
||||||
|
try:
|
||||||
|
match = data.rindex(chunk, 0, i)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
if (i - match) <= 2047:
|
||||||
|
break
|
||||||
|
match = -1
|
||||||
|
if match >= 0:
|
||||||
|
n = len(chunk)
|
||||||
|
m = i - match
|
||||||
|
code = 0x8000 + ((m << 3) & 0x3ff8) + (n - 3)
|
||||||
|
out.write(pack('>H', code))
|
||||||
|
i += n
|
||||||
|
continue
|
||||||
|
ch = data[i:i+1]
|
||||||
|
och = ord(ch)
|
||||||
|
i += 1
|
||||||
|
if ch == b' ' and (i + 1) < ldata:
|
||||||
|
onch = ord(data[i:i+1])
|
||||||
|
if onch >= 0x40 and onch < 0x80:
|
||||||
|
out.write(pack('>B', onch ^ 0x80))
|
||||||
|
i += 1
|
||||||
|
continue
|
||||||
|
if och == 0 or (och > 8 and och < 0x80):
|
||||||
|
out.write(ch)
|
||||||
|
else:
|
||||||
|
j = i
|
||||||
|
binseq = [ch]
|
||||||
|
while j < ldata and len(binseq) < 8:
|
||||||
|
ch = data[j:j+1]
|
||||||
|
och = ord(ch)
|
||||||
|
if och == 0 or (och > 8 and och < 0x80):
|
||||||
|
break
|
||||||
|
binseq.append(ch)
|
||||||
|
j += 1
|
||||||
|
out.write(pack('>B', len(binseq)))
|
||||||
|
out.write(b''.join(binseq))
|
||||||
|
i += len(binseq) - 1
|
||||||
|
return out.getvalue()
|
||||||
|
|
||||||
|
|
||||||
|
def find_tests():
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
class Test(unittest.TestCase):
|
||||||
|
|
||||||
|
def test_palmdoc_compression(self):
|
||||||
|
for test in [
|
||||||
|
b'abc\x03\x04\x05\x06ms', # Test binary writing
|
||||||
|
b'a b c \xfed ', # Test encoding of spaces
|
||||||
|
b'0123456789axyz2bxyz2cdfgfo9iuyerh',
|
||||||
|
b'0123456789asd0123456789asd|yyzzxxffhhjjkk',
|
||||||
|
(b'ciewacnaq eiu743 r787q 0w% ; sa fd\xef\ffdxosac wocjp acoiecowei '
|
||||||
|
b'owaic jociowapjcivcjpoivjporeivjpoavca; p9aw8743y6r74%$^$^%8 ')
|
||||||
|
]:
|
||||||
|
x = compress_doc(test)
|
||||||
|
self.assertEqual(py_compress_doc(test), x)
|
||||||
|
self.assertEqual(decompress_doc(x), test)
|
||||||
|
|
||||||
|
return unittest.defaultTestLoader.loadTestsFromTestCase(Test)
|
||||||
30
ebook_converter/ebooks/conversion/__init__.py
Normal file
30
ebook_converter/ebooks/conversion/__init__.py
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from polyglot.builtins import native_string_type
|
||||||
|
|
||||||
|
|
||||||
|
class ConversionUserFeedBack(Exception):
|
||||||
|
|
||||||
|
def __init__(self, title, msg, level='info', det_msg=''):
|
||||||
|
''' Show a simple message to the user
|
||||||
|
|
||||||
|
:param title: The title (very short description)
|
||||||
|
:param msg: The message to show the user
|
||||||
|
:param level: Must be one of 'info', 'warn' or 'error'
|
||||||
|
:param det_msg: Optional detailed message to show the user
|
||||||
|
'''
|
||||||
|
import json
|
||||||
|
Exception.__init__(self, json.dumps({'msg':msg, 'level':level,
|
||||||
|
'det_msg':det_msg, 'title':title}))
|
||||||
|
self.title, self.msg, self.det_msg = title, msg, det_msg
|
||||||
|
self.level = level
|
||||||
|
|
||||||
|
|
||||||
|
# Ensure exception uses fully qualified name as this is used to detect it in
|
||||||
|
# the GUI.
|
||||||
|
ConversionUserFeedBack.__name__ = native_string_type('calibre.ebooks.conversion.ConversionUserFeedBack')
|
||||||
428
ebook_converter/ebooks/conversion/cli.py
Normal file
428
ebook_converter/ebooks/conversion/cli.py
Normal file
@@ -0,0 +1,428 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
'''
|
||||||
|
Command line interface to conversion sub-system
|
||||||
|
'''
|
||||||
|
|
||||||
|
import sys, os, numbers
|
||||||
|
from optparse import OptionGroup, Option
|
||||||
|
from collections import OrderedDict
|
||||||
|
|
||||||
|
from calibre.utils.config import OptionParser
|
||||||
|
from calibre.utils.logging import Log
|
||||||
|
from calibre.customize.conversion import OptionRecommendation
|
||||||
|
from calibre import patheq
|
||||||
|
from calibre.ebooks.conversion import ConversionUserFeedBack
|
||||||
|
from calibre.utils.localization import localize_user_manual_link
|
||||||
|
from polyglot.builtins import iteritems
|
||||||
|
|
||||||
|
USAGE = '%prog ' + _('''\
|
||||||
|
input_file output_file [options]
|
||||||
|
|
||||||
|
Convert an e-book from one format to another.
|
||||||
|
|
||||||
|
input_file is the input and output_file is the output. Both must be \
|
||||||
|
specified as the first two arguments to the command.
|
||||||
|
|
||||||
|
The output e-book format is guessed from the file extension of \
|
||||||
|
output_file. output_file can also be of the special format .EXT where \
|
||||||
|
EXT is the output file extension. In this case, the name of the output \
|
||||||
|
file is derived from the name of the input file. Note that the filenames must \
|
||||||
|
not start with a hyphen. Finally, if output_file has no extension, then \
|
||||||
|
it is treated as a directory and an "open e-book" (OEB) consisting of HTML \
|
||||||
|
files is written to that directory. These files are the files that would \
|
||||||
|
normally have been passed to the output plugin.
|
||||||
|
|
||||||
|
After specifying the input \
|
||||||
|
and output file you can customize the conversion by specifying various \
|
||||||
|
options. The available options depend on the input and output file types. \
|
||||||
|
To get help on them specify the input and output file and then use the -h \
|
||||||
|
option.
|
||||||
|
|
||||||
|
For full documentation of the conversion system see
|
||||||
|
''') + localize_user_manual_link('https://manual.calibre-ebook.com/conversion.html')
|
||||||
|
|
||||||
|
HEURISTIC_OPTIONS = ['markup_chapter_headings',
|
||||||
|
'italicize_common_cases', 'fix_indents',
|
||||||
|
'html_unwrap_factor', 'unwrap_lines',
|
||||||
|
'delete_blank_paragraphs', 'format_scene_breaks',
|
||||||
|
'dehyphenate', 'renumber_headings',
|
||||||
|
'replace_scene_breaks']
|
||||||
|
|
||||||
|
DEFAULT_TRUE_OPTIONS = HEURISTIC_OPTIONS + ['remove_fake_margins']
|
||||||
|
|
||||||
|
|
||||||
|
def print_help(parser, log):
|
||||||
|
parser.print_help()
|
||||||
|
|
||||||
|
|
||||||
|
def check_command_line_options(parser, args, log):
|
||||||
|
if len(args) < 3 or args[1].startswith('-') or args[2].startswith('-'):
|
||||||
|
print_help(parser, log)
|
||||||
|
log.error('\n\nYou must specify the input AND output files')
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
input = os.path.abspath(args[1])
|
||||||
|
if not input.endswith('.recipe') and not os.access(input, os.R_OK) and not \
|
||||||
|
('-h' in args or '--help' in args):
|
||||||
|
log.error('Cannot read from', input)
|
||||||
|
raise SystemExit(1)
|
||||||
|
if input.endswith('.recipe') and not os.access(input, os.R_OK):
|
||||||
|
input = args[1]
|
||||||
|
|
||||||
|
output = args[2]
|
||||||
|
if (output.startswith('.') and output[:2] not in {'..', '.'} and '/' not in
|
||||||
|
output and '\\' not in output):
|
||||||
|
output = os.path.splitext(os.path.basename(input))[0]+output
|
||||||
|
output = os.path.abspath(output)
|
||||||
|
|
||||||
|
return input, output
|
||||||
|
|
||||||
|
|
||||||
|
def option_recommendation_to_cli_option(add_option, rec):
|
||||||
|
opt = rec.option
|
||||||
|
switches = ['-'+opt.short_switch] if opt.short_switch else []
|
||||||
|
switches.append('--'+opt.long_switch)
|
||||||
|
attrs = dict(dest=opt.name, help=opt.help,
|
||||||
|
choices=opt.choices, default=rec.recommended_value)
|
||||||
|
if isinstance(rec.recommended_value, type(True)):
|
||||||
|
attrs['action'] = 'store_false' if rec.recommended_value else \
|
||||||
|
'store_true'
|
||||||
|
else:
|
||||||
|
if isinstance(rec.recommended_value, numbers.Integral):
|
||||||
|
attrs['type'] = 'int'
|
||||||
|
if isinstance(rec.recommended_value, numbers.Real):
|
||||||
|
attrs['type'] = 'float'
|
||||||
|
|
||||||
|
if opt.long_switch == 'verbose':
|
||||||
|
attrs['action'] = 'count'
|
||||||
|
attrs.pop('type', '')
|
||||||
|
if opt.name == 'read_metadata_from_opf':
|
||||||
|
switches.append('--from-opf')
|
||||||
|
if opt.name == 'transform_css_rules':
|
||||||
|
attrs['help'] = _(
|
||||||
|
'Path to a file containing rules to transform the CSS styles'
|
||||||
|
' in this book. The easiest way to create such a file is to'
|
||||||
|
' use the wizard for creating rules in the calibre GUI. Access'
|
||||||
|
' it in the "Look & feel->Transform styles" section of the conversion'
|
||||||
|
' dialog. Once you create the rules, you can use the "Export" button'
|
||||||
|
' to save them to a file.'
|
||||||
|
)
|
||||||
|
if opt.name in DEFAULT_TRUE_OPTIONS and rec.recommended_value is True:
|
||||||
|
switches = ['--disable-'+opt.long_switch]
|
||||||
|
add_option(Option(*switches, **attrs))
|
||||||
|
|
||||||
|
|
||||||
|
def group_titles():
|
||||||
|
return _('INPUT OPTIONS'), _('OUTPUT OPTIONS')
|
||||||
|
|
||||||
|
|
||||||
|
def recipe_test(option, opt_str, value, parser):
|
||||||
|
assert value is None
|
||||||
|
value = []
|
||||||
|
|
||||||
|
def floatable(s):
|
||||||
|
try:
|
||||||
|
float(s)
|
||||||
|
return True
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
for arg in parser.rargs:
|
||||||
|
# stop on --foo like options
|
||||||
|
if arg[:2] == "--":
|
||||||
|
break
|
||||||
|
# stop on -a, but not on -3 or -3.0
|
||||||
|
if arg[:1] == "-" and len(arg) > 1 and not floatable(arg):
|
||||||
|
break
|
||||||
|
try:
|
||||||
|
value.append(int(arg))
|
||||||
|
except (TypeError, ValueError, AttributeError):
|
||||||
|
break
|
||||||
|
if len(value) == 2:
|
||||||
|
break
|
||||||
|
del parser.rargs[:len(value)]
|
||||||
|
|
||||||
|
while len(value) < 2:
|
||||||
|
value.append(2)
|
||||||
|
|
||||||
|
setattr(parser.values, option.dest, tuple(value))
|
||||||
|
|
||||||
|
|
||||||
|
def add_input_output_options(parser, plumber):
|
||||||
|
input_options, output_options = \
|
||||||
|
plumber.input_options, plumber.output_options
|
||||||
|
|
||||||
|
def add_options(group, options):
|
||||||
|
for opt in options:
|
||||||
|
if plumber.input_fmt == 'recipe' and opt.option.long_switch == 'test':
|
||||||
|
group(Option('--test', dest='test', action='callback', callback=recipe_test))
|
||||||
|
else:
|
||||||
|
option_recommendation_to_cli_option(group, opt)
|
||||||
|
|
||||||
|
if input_options:
|
||||||
|
title = group_titles()[0]
|
||||||
|
io = OptionGroup(parser, title, _('Options to control the processing'
|
||||||
|
' of the input %s file')%plumber.input_fmt)
|
||||||
|
add_options(io.add_option, input_options)
|
||||||
|
parser.add_option_group(io)
|
||||||
|
|
||||||
|
if output_options:
|
||||||
|
title = group_titles()[1]
|
||||||
|
oo = OptionGroup(parser, title, _('Options to control the processing'
|
||||||
|
' of the output %s')%plumber.output_fmt)
|
||||||
|
add_options(oo.add_option, output_options)
|
||||||
|
parser.add_option_group(oo)
|
||||||
|
|
||||||
|
|
||||||
|
def add_pipeline_options(parser, plumber):
|
||||||
|
groups = OrderedDict((
|
||||||
|
('' , ('',
|
||||||
|
[
|
||||||
|
'input_profile',
|
||||||
|
'output_profile',
|
||||||
|
]
|
||||||
|
)),
|
||||||
|
(_('LOOK AND FEEL') , (
|
||||||
|
_('Options to control the look and feel of the output'),
|
||||||
|
[
|
||||||
|
'base_font_size', 'disable_font_rescaling',
|
||||||
|
'font_size_mapping', 'embed_font_family',
|
||||||
|
'subset_embedded_fonts', 'embed_all_fonts',
|
||||||
|
'line_height', 'minimum_line_height',
|
||||||
|
'linearize_tables',
|
||||||
|
'extra_css', 'filter_css', 'transform_css_rules', 'expand_css',
|
||||||
|
'smarten_punctuation', 'unsmarten_punctuation',
|
||||||
|
'margin_top', 'margin_left', 'margin_right',
|
||||||
|
'margin_bottom', 'change_justification',
|
||||||
|
'insert_blank_line', 'insert_blank_line_size',
|
||||||
|
'remove_paragraph_spacing',
|
||||||
|
'remove_paragraph_spacing_indent_size',
|
||||||
|
'asciiize', 'keep_ligatures',
|
||||||
|
]
|
||||||
|
)),
|
||||||
|
|
||||||
|
(_('HEURISTIC PROCESSING') , (
|
||||||
|
_('Modify the document text and structure using common'
|
||||||
|
' patterns. Disabled by default. Use %(en)s to enable. '
|
||||||
|
' Individual actions can be disabled with the %(dis)s options.')
|
||||||
|
% dict(en='--enable-heuristics', dis='--disable-*'),
|
||||||
|
['enable_heuristics'] + HEURISTIC_OPTIONS
|
||||||
|
)),
|
||||||
|
|
||||||
|
(_('SEARCH AND REPLACE') , (
|
||||||
|
_('Modify the document text and structure using user defined patterns.'),
|
||||||
|
[
|
||||||
|
'sr1_search', 'sr1_replace',
|
||||||
|
'sr2_search', 'sr2_replace',
|
||||||
|
'sr3_search', 'sr3_replace',
|
||||||
|
'search_replace',
|
||||||
|
]
|
||||||
|
)),
|
||||||
|
|
||||||
|
(_('STRUCTURE DETECTION') , (
|
||||||
|
_('Control auto-detection of document structure.'),
|
||||||
|
[
|
||||||
|
'chapter', 'chapter_mark',
|
||||||
|
'prefer_metadata_cover', 'remove_first_image',
|
||||||
|
'insert_metadata', 'page_breaks_before',
|
||||||
|
'remove_fake_margins', 'start_reading_at',
|
||||||
|
]
|
||||||
|
)),
|
||||||
|
|
||||||
|
(_('TABLE OF CONTENTS') , (
|
||||||
|
_('Control the automatic generation of a Table of Contents. By '
|
||||||
|
'default, if the source file has a Table of Contents, it will '
|
||||||
|
'be used in preference to the automatically generated one.'),
|
||||||
|
[
|
||||||
|
'level1_toc', 'level2_toc', 'level3_toc',
|
||||||
|
'toc_threshold', 'max_toc_links', 'no_chapters_in_toc',
|
||||||
|
'use_auto_toc', 'toc_filter', 'duplicate_links_in_toc',
|
||||||
|
]
|
||||||
|
)),
|
||||||
|
|
||||||
|
(_('METADATA') , (_('Options to set metadata in the output'),
|
||||||
|
plumber.metadata_option_names + ['read_metadata_from_opf'],
|
||||||
|
)),
|
||||||
|
(_('DEBUG'), (_('Options to help with debugging the conversion'),
|
||||||
|
[
|
||||||
|
'verbose',
|
||||||
|
'debug_pipeline',
|
||||||
|
])),
|
||||||
|
|
||||||
|
))
|
||||||
|
|
||||||
|
for group, (desc, options) in iteritems(groups):
|
||||||
|
if group:
|
||||||
|
group = OptionGroup(parser, group, desc)
|
||||||
|
parser.add_option_group(group)
|
||||||
|
add_option = group.add_option if group != '' else parser.add_option
|
||||||
|
|
||||||
|
for name in options:
|
||||||
|
rec = plumber.get_option_by_name(name)
|
||||||
|
if rec.level < rec.HIGH:
|
||||||
|
option_recommendation_to_cli_option(add_option, rec)
|
||||||
|
|
||||||
|
|
||||||
|
def option_parser():
|
||||||
|
parser = OptionParser(usage=USAGE)
|
||||||
|
parser.add_option('--list-recipes', default=False, action='store_true',
|
||||||
|
help=_('List builtin recipe names. You can create an e-book from '
|
||||||
|
'a builtin recipe like this: ebook-convert "Recipe Name.recipe" '
|
||||||
|
'output.epub'))
|
||||||
|
return parser
|
||||||
|
|
||||||
|
|
||||||
|
class ProgressBar(object):
|
||||||
|
|
||||||
|
def __init__(self, log):
|
||||||
|
self.log = log
|
||||||
|
|
||||||
|
def __call__(self, frac, msg=''):
|
||||||
|
if msg:
|
||||||
|
percent = int(frac*100)
|
||||||
|
self.log('%d%% %s'%(percent, msg))
|
||||||
|
|
||||||
|
|
||||||
|
def create_option_parser(args, log):
|
||||||
|
if '--version' in args:
|
||||||
|
from calibre.constants import __appname__, __version__, __author__
|
||||||
|
log(os.path.basename(args[0]), '('+__appname__, __version__+')')
|
||||||
|
log('Created by:', __author__)
|
||||||
|
raise SystemExit(0)
|
||||||
|
if '--list-recipes' in args:
|
||||||
|
from calibre.web.feeds.recipes.collection import get_builtin_recipe_titles
|
||||||
|
log('Available recipes:')
|
||||||
|
titles = sorted(get_builtin_recipe_titles())
|
||||||
|
for title in titles:
|
||||||
|
try:
|
||||||
|
log('\t'+title)
|
||||||
|
except:
|
||||||
|
log('\t'+repr(title))
|
||||||
|
log('%d recipes available'%len(titles))
|
||||||
|
raise SystemExit(0)
|
||||||
|
|
||||||
|
parser = option_parser()
|
||||||
|
if len(args) < 3:
|
||||||
|
print_help(parser, log)
|
||||||
|
if any(x in args for x in ('-h', '--help')):
|
||||||
|
raise SystemExit(0)
|
||||||
|
else:
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
input, output = check_command_line_options(parser, args, log)
|
||||||
|
|
||||||
|
from calibre.ebooks.conversion.plumber import Plumber
|
||||||
|
|
||||||
|
reporter = ProgressBar(log)
|
||||||
|
if patheq(input, output):
|
||||||
|
raise ValueError('Input file is the same as the output file')
|
||||||
|
|
||||||
|
plumber = Plumber(input, output, log, reporter)
|
||||||
|
add_input_output_options(parser, plumber)
|
||||||
|
add_pipeline_options(parser, plumber)
|
||||||
|
|
||||||
|
return parser, plumber
|
||||||
|
|
||||||
|
|
||||||
|
def abspath(x):
|
||||||
|
if x.startswith('http:') or x.startswith('https:'):
|
||||||
|
return x
|
||||||
|
return os.path.abspath(os.path.expanduser(x))
|
||||||
|
|
||||||
|
|
||||||
|
def escape_sr_pattern(exp):
|
||||||
|
return exp.replace('\n', '\ue123')
|
||||||
|
|
||||||
|
|
||||||
|
def read_sr_patterns(path, log=None):
|
||||||
|
import json, re
|
||||||
|
pats = []
|
||||||
|
with open(path, 'rb') as f:
|
||||||
|
lines = f.read().decode('utf-8').splitlines()
|
||||||
|
pat = None
|
||||||
|
for line in lines:
|
||||||
|
if pat is None:
|
||||||
|
if not line.strip():
|
||||||
|
continue
|
||||||
|
line = line.replace('\ue123', '\n')
|
||||||
|
try:
|
||||||
|
re.compile(line)
|
||||||
|
except:
|
||||||
|
msg = 'Invalid regular expression: %r from file: %r'%(
|
||||||
|
line, path)
|
||||||
|
if log is not None:
|
||||||
|
log.error(msg)
|
||||||
|
raise SystemExit(1)
|
||||||
|
else:
|
||||||
|
raise ValueError(msg)
|
||||||
|
pat = line
|
||||||
|
else:
|
||||||
|
pats.append((pat, line))
|
||||||
|
pat = None
|
||||||
|
return json.dumps(pats)
|
||||||
|
|
||||||
|
|
||||||
|
def main(args=sys.argv):
|
||||||
|
log = Log()
|
||||||
|
parser, plumber = create_option_parser(args, log)
|
||||||
|
opts, leftover_args = parser.parse_args(args)
|
||||||
|
if len(leftover_args) > 3:
|
||||||
|
log.error('Extra arguments not understood:', u', '.join(leftover_args[3:]))
|
||||||
|
return 1
|
||||||
|
for x in ('read_metadata_from_opf', 'cover'):
|
||||||
|
if getattr(opts, x, None) is not None:
|
||||||
|
setattr(opts, x, abspath(getattr(opts, x)))
|
||||||
|
if opts.search_replace:
|
||||||
|
opts.search_replace = read_sr_patterns(opts.search_replace, log)
|
||||||
|
if opts.transform_css_rules:
|
||||||
|
from calibre.ebooks.css_transform_rules import import_rules, validate_rule
|
||||||
|
with open(opts.transform_css_rules, 'rb') as tcr:
|
||||||
|
opts.transform_css_rules = rules = list(import_rules(tcr.read()))
|
||||||
|
for rule in rules:
|
||||||
|
title, msg = validate_rule(rule)
|
||||||
|
if title and msg:
|
||||||
|
log.error('Failed to parse CSS transform rules')
|
||||||
|
log.error(title)
|
||||||
|
log.error(msg)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
recommendations = [(n.dest, getattr(opts, n.dest),
|
||||||
|
OptionRecommendation.HIGH)
|
||||||
|
for n in parser.options_iter()
|
||||||
|
if n.dest]
|
||||||
|
plumber.merge_ui_recommendations(recommendations)
|
||||||
|
|
||||||
|
try:
|
||||||
|
plumber.run()
|
||||||
|
except ConversionUserFeedBack as e:
|
||||||
|
ll = {'info': log.info, 'warn': log.warn,
|
||||||
|
'error':log.error}.get(e.level, log.info)
|
||||||
|
ll(e.title)
|
||||||
|
if e.det_msg:
|
||||||
|
log.debug(e.detmsg)
|
||||||
|
ll(e.msg)
|
||||||
|
raise SystemExit(1)
|
||||||
|
|
||||||
|
log(_('Output saved to'), ' ', plumber.output)
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def manual_index_strings():
|
||||||
|
return _('''\
|
||||||
|
The options and default values for the options change depending on both the
|
||||||
|
input and output formats, so you should always check with::
|
||||||
|
|
||||||
|
%s
|
||||||
|
|
||||||
|
Below are the options that are common to all conversion, followed by the
|
||||||
|
options specific to every input and output format.''')
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
sys.exit(main())
|
||||||
10
ebook_converter/ebooks/conversion/plugins/__init__.py
Normal file
10
ebook_converter/ebooks/conversion/plugins/__init__.py
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
29
ebook_converter/ebooks/conversion/plugins/azw4_input.py
Normal file
29
ebook_converter/ebooks/conversion/plugins/azw4_input.py
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class AZW4Input(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'AZW4 Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert AZW4 to HTML'
|
||||||
|
file_types = {'azw4'}
|
||||||
|
commit_name = 'azw4_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.pdb.header import PdbHeaderReader
|
||||||
|
from calibre.ebooks.azw4.reader import Reader
|
||||||
|
|
||||||
|
header = PdbHeaderReader(stream)
|
||||||
|
reader = Reader(header, stream, log, options)
|
||||||
|
opf = reader.extract_content(getcwd())
|
||||||
|
|
||||||
|
return opf
|
||||||
202
ebook_converter/ebooks/conversion/plugins/chm_input.py
Normal file
202
ebook_converter/ebooks/conversion/plugins/chm_input.py
Normal file
@@ -0,0 +1,202 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
''' CHM File decoding support '''
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>,' \
|
||||||
|
' and Alex Bramley <a.bramley at gmail.com>.'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from calibre.constants import filesystem_encoding
|
||||||
|
from polyglot.builtins import unicode_type, as_bytes
|
||||||
|
|
||||||
|
|
||||||
|
class CHMInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'CHM Input'
|
||||||
|
author = 'Kovid Goyal and Alex Bramley'
|
||||||
|
description = 'Convert CHM files to OEB'
|
||||||
|
file_types = {'chm'}
|
||||||
|
commit_name = 'chm_input'
|
||||||
|
|
||||||
|
def _chmtohtml(self, output_dir, chm_path, no_images, log, debug_dump=False):
|
||||||
|
from calibre.ebooks.chm.reader import CHMReader
|
||||||
|
log.debug('Opening CHM file')
|
||||||
|
rdr = CHMReader(chm_path, log, input_encoding=self.opts.input_encoding)
|
||||||
|
log.debug('Extracting CHM to %s' % output_dir)
|
||||||
|
rdr.extract_content(output_dir, debug_dump=debug_dump)
|
||||||
|
self._chm_reader = rdr
|
||||||
|
return rdr.hhc_path
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log, accelerators):
|
||||||
|
from calibre.ebooks.chm.metadata import get_metadata_from_reader
|
||||||
|
from calibre.customize.ui import plugin_for_input_format
|
||||||
|
self.opts = options
|
||||||
|
|
||||||
|
log.debug('Processing CHM...')
|
||||||
|
with TemporaryDirectory('_chm2oeb') as tdir:
|
||||||
|
if not isinstance(tdir, unicode_type):
|
||||||
|
tdir = tdir.decode(filesystem_encoding)
|
||||||
|
html_input = plugin_for_input_format('html')
|
||||||
|
for opt in html_input.options:
|
||||||
|
setattr(options, opt.option.name, opt.recommended_value)
|
||||||
|
no_images = False # options.no_images
|
||||||
|
chm_name = stream.name
|
||||||
|
# chm_data = stream.read()
|
||||||
|
|
||||||
|
# closing stream so CHM can be opened by external library
|
||||||
|
stream.close()
|
||||||
|
log.debug('tdir=%s' % tdir)
|
||||||
|
log.debug('stream.name=%s' % stream.name)
|
||||||
|
debug_dump = False
|
||||||
|
odi = options.debug_pipeline
|
||||||
|
if odi:
|
||||||
|
debug_dump = os.path.join(odi, 'input')
|
||||||
|
mainname = self._chmtohtml(tdir, chm_name, no_images, log,
|
||||||
|
debug_dump=debug_dump)
|
||||||
|
mainpath = os.path.join(tdir, mainname)
|
||||||
|
|
||||||
|
try:
|
||||||
|
metadata = get_metadata_from_reader(self._chm_reader)
|
||||||
|
except Exception:
|
||||||
|
log.exception('Failed to read metadata, using filename')
|
||||||
|
from calibre.ebooks.metadata.book.base import Metadata
|
||||||
|
metadata = Metadata(os.path.basename(chm_name))
|
||||||
|
encoding = self._chm_reader.get_encoding() or options.input_encoding or 'cp1252'
|
||||||
|
self._chm_reader.CloseCHM()
|
||||||
|
# print((tdir, mainpath))
|
||||||
|
# from calibre import ipython
|
||||||
|
# ipython()
|
||||||
|
|
||||||
|
options.debug_pipeline = None
|
||||||
|
options.input_encoding = 'utf-8'
|
||||||
|
uenc = encoding
|
||||||
|
if os.path.abspath(mainpath) in self._chm_reader.re_encoded_files:
|
||||||
|
uenc = 'utf-8'
|
||||||
|
htmlpath, toc = self._create_html_root(mainpath, log, uenc)
|
||||||
|
oeb = self._create_oebbook_html(htmlpath, tdir, options, log, metadata)
|
||||||
|
options.debug_pipeline = odi
|
||||||
|
if toc.count() > 1:
|
||||||
|
oeb.toc = self.parse_html_toc(oeb.spine[0])
|
||||||
|
oeb.manifest.remove(oeb.spine[0])
|
||||||
|
oeb.auto_generated_toc = False
|
||||||
|
return oeb
|
||||||
|
|
||||||
|
def parse_html_toc(self, item):
|
||||||
|
from calibre.ebooks.oeb.base import TOC, XPath
|
||||||
|
dx = XPath('./h:div')
|
||||||
|
ax = XPath('./h:a[1]')
|
||||||
|
|
||||||
|
def do_node(parent, div):
|
||||||
|
for child in dx(div):
|
||||||
|
a = ax(child)[0]
|
||||||
|
c = parent.add(a.text, a.attrib['href'])
|
||||||
|
do_node(c, child)
|
||||||
|
|
||||||
|
toc = TOC()
|
||||||
|
root = XPath('//h:div[1]')(item.data)[0]
|
||||||
|
do_node(toc, root)
|
||||||
|
return toc
|
||||||
|
|
||||||
|
def _create_oebbook_html(self, htmlpath, basedir, opts, log, mi):
|
||||||
|
# use HTMLInput plugin to generate book
|
||||||
|
from calibre.customize.builtins import HTMLInput
|
||||||
|
opts.breadth_first = True
|
||||||
|
htmlinput = HTMLInput(None)
|
||||||
|
oeb = htmlinput.create_oebbook(htmlpath, basedir, opts, log, mi)
|
||||||
|
return oeb
|
||||||
|
|
||||||
|
def _create_html_root(self, hhcpath, log, encoding):
|
||||||
|
from lxml import html
|
||||||
|
from polyglot.urllib import unquote as _unquote
|
||||||
|
from calibre.ebooks.oeb.base import urlquote
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
|
hhcdata = self._read_file(hhcpath)
|
||||||
|
hhcdata = hhcdata.decode(encoding)
|
||||||
|
hhcdata = xml_to_unicode(hhcdata, verbose=True,
|
||||||
|
strip_encoding_pats=True, resolve_entities=True)[0]
|
||||||
|
hhcroot = html.fromstring(hhcdata)
|
||||||
|
toc = self._process_nodes(hhcroot)
|
||||||
|
# print("=============================")
|
||||||
|
# print("Printing hhcroot")
|
||||||
|
# print(etree.tostring(hhcroot, pretty_print=True))
|
||||||
|
# print("=============================")
|
||||||
|
log.debug('Found %d section nodes' % toc.count())
|
||||||
|
htmlpath = os.path.splitext(hhcpath)[0] + ".html"
|
||||||
|
base = os.path.dirname(os.path.abspath(htmlpath))
|
||||||
|
|
||||||
|
def unquote(x):
|
||||||
|
if isinstance(x, unicode_type):
|
||||||
|
x = x.encode('utf-8')
|
||||||
|
return _unquote(x).decode('utf-8')
|
||||||
|
|
||||||
|
def unquote_path(x):
|
||||||
|
y = unquote(x)
|
||||||
|
if (not os.path.exists(os.path.join(base, x)) and os.path.exists(os.path.join(base, y))):
|
||||||
|
x = y
|
||||||
|
return x
|
||||||
|
|
||||||
|
def donode(item, parent, base, subpath):
|
||||||
|
for child in item:
|
||||||
|
title = child.title
|
||||||
|
if not title:
|
||||||
|
continue
|
||||||
|
raw = unquote_path(child.href or '')
|
||||||
|
rsrcname = os.path.basename(raw)
|
||||||
|
rsrcpath = os.path.join(subpath, rsrcname)
|
||||||
|
if (not os.path.exists(os.path.join(base, rsrcpath)) and os.path.exists(os.path.join(base, raw))):
|
||||||
|
rsrcpath = raw
|
||||||
|
|
||||||
|
if '%' not in rsrcpath:
|
||||||
|
rsrcpath = urlquote(rsrcpath)
|
||||||
|
if not raw:
|
||||||
|
rsrcpath = ''
|
||||||
|
c = DIV(A(title, href=rsrcpath))
|
||||||
|
donode(child, c, base, subpath)
|
||||||
|
parent.append(c)
|
||||||
|
|
||||||
|
with open(htmlpath, 'wb') as f:
|
||||||
|
if toc.count() > 1:
|
||||||
|
from lxml.html.builder import HTML, BODY, DIV, A
|
||||||
|
path0 = toc[0].href
|
||||||
|
path0 = unquote_path(path0)
|
||||||
|
subpath = os.path.dirname(path0)
|
||||||
|
base = os.path.dirname(f.name)
|
||||||
|
root = DIV()
|
||||||
|
donode(toc, root, base, subpath)
|
||||||
|
raw = html.tostring(HTML(BODY(root)), encoding='utf-8',
|
||||||
|
pretty_print=True)
|
||||||
|
f.write(raw)
|
||||||
|
else:
|
||||||
|
f.write(as_bytes(hhcdata))
|
||||||
|
return htmlpath, toc
|
||||||
|
|
||||||
|
def _read_file(self, name):
|
||||||
|
with lopen(name, 'rb') as f:
|
||||||
|
data = f.read()
|
||||||
|
return data
|
||||||
|
|
||||||
|
def add_node(self, node, toc, ancestor_map):
|
||||||
|
from calibre.ebooks.chm.reader import match_string
|
||||||
|
if match_string(node.attrib.get('type', ''), 'text/sitemap'):
|
||||||
|
p = node.xpath('ancestor::ul[1]/ancestor::li[1]/object[1]')
|
||||||
|
parent = p[0] if p else None
|
||||||
|
toc = ancestor_map.get(parent, toc)
|
||||||
|
title = href = ''
|
||||||
|
for param in node.xpath('./param'):
|
||||||
|
if match_string(param.attrib['name'], 'name'):
|
||||||
|
title = param.attrib['value']
|
||||||
|
elif match_string(param.attrib['name'], 'local'):
|
||||||
|
href = param.attrib['value']
|
||||||
|
child = toc.add(title or _('Unknown'), href)
|
||||||
|
ancestor_map[node] = child
|
||||||
|
|
||||||
|
def _process_nodes(self, root):
|
||||||
|
from calibre.ebooks.oeb.base import TOC
|
||||||
|
toc = TOC()
|
||||||
|
ancestor_map = {}
|
||||||
|
for node in root.xpath('//object'):
|
||||||
|
self.add_node(node, toc, ancestor_map)
|
||||||
|
return toc
|
||||||
310
ebook_converter/ebooks/conversion/plugins/comic_input.py
Normal file
310
ebook_converter/ebooks/conversion/plugins/comic_input.py
Normal file
@@ -0,0 +1,310 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
'''
|
||||||
|
Based on ideas from comiclrf created by FangornUK.
|
||||||
|
'''
|
||||||
|
|
||||||
|
import shutil, textwrap, codecs, os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from calibre import CurrentDir
|
||||||
|
from calibre.ptempfile import PersistentTemporaryDirectory
|
||||||
|
from polyglot.builtins import getcwd, map
|
||||||
|
|
||||||
|
|
||||||
|
class ComicInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'Comic Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Optimize comic files (.cbz, .cbr, .cbc) for viewing on portable devices'
|
||||||
|
file_types = {'cbz', 'cbr', 'cbc'}
|
||||||
|
is_image_collection = True
|
||||||
|
commit_name = 'comic_input'
|
||||||
|
core_usage = -1
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='colors', recommended_value=0,
|
||||||
|
help=_('Reduce the number of colors used in the image. This works only'
|
||||||
|
' if you choose the PNG output format. It is useful to reduce file sizes.'
|
||||||
|
' Set to zero to turn off. Maximum value is 256. It is off by default.')),
|
||||||
|
OptionRecommendation(name='dont_normalize', recommended_value=False,
|
||||||
|
help=_('Disable normalize (improve contrast) color range '
|
||||||
|
'for pictures. Default: False')),
|
||||||
|
OptionRecommendation(name='keep_aspect_ratio', recommended_value=False,
|
||||||
|
help=_('Maintain picture aspect ratio. Default is to fill the screen.')),
|
||||||
|
OptionRecommendation(name='dont_sharpen', recommended_value=False,
|
||||||
|
help=_('Disable sharpening.')),
|
||||||
|
OptionRecommendation(name='disable_trim', recommended_value=False,
|
||||||
|
help=_('Disable trimming of comic pages. For some comics, '
|
||||||
|
'trimming might remove content as well as borders.')),
|
||||||
|
OptionRecommendation(name='landscape', recommended_value=False,
|
||||||
|
help=_("Don't split landscape images into two portrait images")),
|
||||||
|
OptionRecommendation(name='wide', recommended_value=False,
|
||||||
|
help=_("Keep aspect ratio and scale image using screen height as "
|
||||||
|
"image width for viewing in landscape mode.")),
|
||||||
|
OptionRecommendation(name='right2left', recommended_value=False,
|
||||||
|
help=_('Used for right-to-left publications like manga. '
|
||||||
|
'Causes landscape pages to be split into portrait pages '
|
||||||
|
'from right to left.')),
|
||||||
|
OptionRecommendation(name='despeckle', recommended_value=False,
|
||||||
|
help=_('Enable Despeckle. Reduces speckle noise. '
|
||||||
|
'May greatly increase processing time.')),
|
||||||
|
OptionRecommendation(name='no_sort', recommended_value=False,
|
||||||
|
help=_("Don't sort the files found in the comic "
|
||||||
|
"alphabetically by name. Instead use the order they were "
|
||||||
|
"added to the comic.")),
|
||||||
|
OptionRecommendation(name='output_format', choices=['png', 'jpg'],
|
||||||
|
recommended_value='png', help=_('The format that images in the created e-book '
|
||||||
|
'are converted to. You can experiment to see which format gives '
|
||||||
|
'you optimal size and look on your device.')),
|
||||||
|
OptionRecommendation(name='no_process', recommended_value=False,
|
||||||
|
help=_("Apply no processing to the image")),
|
||||||
|
OptionRecommendation(name='dont_grayscale', recommended_value=False,
|
||||||
|
help=_('Do not convert the image to grayscale (black and white)')),
|
||||||
|
OptionRecommendation(name='comic_image_size', recommended_value=None,
|
||||||
|
help=_('Specify the image size as widthxheight pixels. Normally,'
|
||||||
|
' an image size is automatically calculated from the output '
|
||||||
|
'profile, this option overrides it.')),
|
||||||
|
OptionRecommendation(name='dont_add_comic_pages_to_toc', recommended_value=False,
|
||||||
|
help=_('When converting a CBC do not add links to each page to'
|
||||||
|
' the TOC. Note this only applies if the TOC has more than one'
|
||||||
|
' section')),
|
||||||
|
}
|
||||||
|
|
||||||
|
recommendations = {
|
||||||
|
('margin_left', 0, OptionRecommendation.HIGH),
|
||||||
|
('margin_top', 0, OptionRecommendation.HIGH),
|
||||||
|
('margin_right', 0, OptionRecommendation.HIGH),
|
||||||
|
('margin_bottom', 0, OptionRecommendation.HIGH),
|
||||||
|
('insert_blank_line', False, OptionRecommendation.HIGH),
|
||||||
|
('remove_paragraph_spacing', False, OptionRecommendation.HIGH),
|
||||||
|
('change_justification', 'left', OptionRecommendation.HIGH),
|
||||||
|
('dont_split_on_pagebreaks', True, OptionRecommendation.HIGH),
|
||||||
|
('chapter', None, OptionRecommendation.HIGH),
|
||||||
|
('page_breaks_brefore', None, OptionRecommendation.HIGH),
|
||||||
|
('use_auto_toc', False, OptionRecommendation.HIGH),
|
||||||
|
('page_breaks_before', None, OptionRecommendation.HIGH),
|
||||||
|
('disable_font_rescaling', True, OptionRecommendation.HIGH),
|
||||||
|
('linearize_tables', False, OptionRecommendation.HIGH),
|
||||||
|
}
|
||||||
|
|
||||||
|
def get_comics_from_collection(self, stream):
|
||||||
|
from calibre.libunzip import extract as zipextract
|
||||||
|
tdir = PersistentTemporaryDirectory('_comic_collection')
|
||||||
|
zipextract(stream, tdir)
|
||||||
|
comics = []
|
||||||
|
with CurrentDir(tdir):
|
||||||
|
if not os.path.exists('comics.txt'):
|
||||||
|
raise ValueError((
|
||||||
|
'%s is not a valid comic collection'
|
||||||
|
' no comics.txt was found in the file')
|
||||||
|
%stream.name)
|
||||||
|
with open('comics.txt', 'rb') as f:
|
||||||
|
raw = f.read()
|
||||||
|
if raw.startswith(codecs.BOM_UTF16_BE):
|
||||||
|
raw = raw.decode('utf-16-be')[1:]
|
||||||
|
elif raw.startswith(codecs.BOM_UTF16_LE):
|
||||||
|
raw = raw.decode('utf-16-le')[1:]
|
||||||
|
elif raw.startswith(codecs.BOM_UTF8):
|
||||||
|
raw = raw.decode('utf-8')[1:]
|
||||||
|
else:
|
||||||
|
raw = raw.decode('utf-8')
|
||||||
|
for line in raw.splitlines():
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
fname, title = line.partition(':')[0], line.partition(':')[-1]
|
||||||
|
fname = fname.replace('#', '_')
|
||||||
|
fname = os.path.join(tdir, *fname.split('/'))
|
||||||
|
if not title:
|
||||||
|
title = os.path.basename(fname).rpartition('.')[0]
|
||||||
|
if os.access(fname, os.R_OK):
|
||||||
|
comics.append([title, fname])
|
||||||
|
if not comics:
|
||||||
|
raise ValueError('%s has no comics'%stream.name)
|
||||||
|
return comics
|
||||||
|
|
||||||
|
def get_pages(self, comic, tdir2):
|
||||||
|
from calibre.ebooks.comic.input import (extract_comic, process_pages,
|
||||||
|
find_pages)
|
||||||
|
tdir = extract_comic(comic)
|
||||||
|
new_pages = find_pages(tdir, sort_on_mtime=self.opts.no_sort,
|
||||||
|
verbose=self.opts.verbose)
|
||||||
|
thumbnail = None
|
||||||
|
if not new_pages:
|
||||||
|
raise ValueError('Could not find any pages in the comic: %s'
|
||||||
|
%comic)
|
||||||
|
if self.opts.no_process:
|
||||||
|
n2 = []
|
||||||
|
for i, page in enumerate(new_pages):
|
||||||
|
n2.append(os.path.join(tdir2, '{} - {}' .format(i, os.path.basename(page))))
|
||||||
|
shutil.copyfile(page, n2[-1])
|
||||||
|
new_pages = n2
|
||||||
|
else:
|
||||||
|
new_pages, failures = process_pages(new_pages, self.opts,
|
||||||
|
self.report_progress, tdir2)
|
||||||
|
if failures:
|
||||||
|
self.log.warning('Could not process the following pages '
|
||||||
|
'(run with --verbose to see why):')
|
||||||
|
for f in failures:
|
||||||
|
self.log.warning('\t', f)
|
||||||
|
if not new_pages:
|
||||||
|
raise ValueError('Could not find any valid pages in comic: %s'
|
||||||
|
% comic)
|
||||||
|
thumbnail = os.path.join(tdir2,
|
||||||
|
'thumbnail.'+self.opts.output_format.lower())
|
||||||
|
if not os.access(thumbnail, os.R_OK):
|
||||||
|
thumbnail = None
|
||||||
|
return new_pages
|
||||||
|
|
||||||
|
def get_images(self):
|
||||||
|
return self._images
|
||||||
|
|
||||||
|
def convert(self, stream, opts, file_ext, log, accelerators):
|
||||||
|
from calibre.ebooks.metadata import MetaInformation
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||||
|
from calibre.ebooks.metadata.toc import TOC
|
||||||
|
|
||||||
|
self.opts, self.log= opts, log
|
||||||
|
if file_ext == 'cbc':
|
||||||
|
comics_ = self.get_comics_from_collection(stream)
|
||||||
|
else:
|
||||||
|
comics_ = [['Comic', os.path.abspath(stream.name)]]
|
||||||
|
stream.close()
|
||||||
|
comics = []
|
||||||
|
for i, x in enumerate(comics_):
|
||||||
|
title, fname = x
|
||||||
|
cdir = 'comic_%d'%(i+1) if len(comics_) > 1 else '.'
|
||||||
|
cdir = os.path.abspath(cdir)
|
||||||
|
if not os.path.exists(cdir):
|
||||||
|
os.makedirs(cdir)
|
||||||
|
pages = self.get_pages(fname, cdir)
|
||||||
|
if not pages:
|
||||||
|
continue
|
||||||
|
if self.for_viewer:
|
||||||
|
comics.append((title, pages, [self.create_viewer_wrapper(pages)]))
|
||||||
|
else:
|
||||||
|
wrappers = self.create_wrappers(pages)
|
||||||
|
comics.append((title, pages, wrappers))
|
||||||
|
|
||||||
|
if not comics:
|
||||||
|
raise ValueError('No comic pages found in %s'%stream.name)
|
||||||
|
|
||||||
|
mi = MetaInformation(os.path.basename(stream.name).rpartition('.')[0],
|
||||||
|
[_('Unknown')])
|
||||||
|
opf = OPFCreator(getcwd(), mi)
|
||||||
|
entries = []
|
||||||
|
|
||||||
|
def href(x):
|
||||||
|
if len(comics) == 1:
|
||||||
|
return os.path.basename(x)
|
||||||
|
return '/'.join(x.split(os.sep)[-2:])
|
||||||
|
|
||||||
|
cover_href = None
|
||||||
|
for comic in comics:
|
||||||
|
pages, wrappers = comic[1:]
|
||||||
|
page_entries = [(x, None) for x in map(href, pages)]
|
||||||
|
entries += [(w, None) for w in map(href, wrappers)] + page_entries
|
||||||
|
if cover_href is None and page_entries:
|
||||||
|
cover_href = page_entries[0][0]
|
||||||
|
opf.create_manifest(entries)
|
||||||
|
spine = []
|
||||||
|
for comic in comics:
|
||||||
|
spine.extend(map(href, comic[2]))
|
||||||
|
self._images = []
|
||||||
|
for comic in comics:
|
||||||
|
self._images.extend(comic[1])
|
||||||
|
opf.create_spine(spine)
|
||||||
|
if self.for_viewer and cover_href:
|
||||||
|
opf.guide.set_cover(cover_href)
|
||||||
|
toc = TOC()
|
||||||
|
if len(comics) == 1:
|
||||||
|
wrappers = comics[0][2]
|
||||||
|
for i, x in enumerate(wrappers):
|
||||||
|
toc.add_item(href(x), None, _('Page')+' %d'%(i+1),
|
||||||
|
play_order=i)
|
||||||
|
else:
|
||||||
|
po = 0
|
||||||
|
for comic in comics:
|
||||||
|
po += 1
|
||||||
|
wrappers = comic[2]
|
||||||
|
stoc = toc.add_item(href(wrappers[0]),
|
||||||
|
None, comic[0], play_order=po)
|
||||||
|
if not opts.dont_add_comic_pages_to_toc:
|
||||||
|
for i, x in enumerate(wrappers):
|
||||||
|
stoc.add_item(href(x), None,
|
||||||
|
_('Page')+' %d'%(i+1), play_order=po)
|
||||||
|
po += 1
|
||||||
|
opf.set_toc(toc)
|
||||||
|
with open('metadata.opf', 'wb') as m, open('toc.ncx', 'wb') as n:
|
||||||
|
opf.render(m, n, 'toc.ncx')
|
||||||
|
return os.path.abspath('metadata.opf')
|
||||||
|
|
||||||
|
def create_wrappers(self, pages):
|
||||||
|
from calibre.ebooks.oeb.base import XHTML_NS
|
||||||
|
wrappers = []
|
||||||
|
WRAPPER = textwrap.dedent('''\
|
||||||
|
<html xmlns="%s">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8"/>
|
||||||
|
<title>Page #%d</title>
|
||||||
|
<style type="text/css">
|
||||||
|
@page { margin:0pt; padding: 0pt}
|
||||||
|
body { margin: 0pt; padding: 0pt}
|
||||||
|
div { text-align: center }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div>
|
||||||
|
<img src="%s" alt="comic page #%d" />
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
''')
|
||||||
|
dir = os.path.dirname(pages[0])
|
||||||
|
for i, page in enumerate(pages):
|
||||||
|
wrapper = WRAPPER%(XHTML_NS, i+1, os.path.basename(page), i+1)
|
||||||
|
page = os.path.join(dir, 'page_%d.xhtml'%(i+1))
|
||||||
|
with open(page, 'wb') as f:
|
||||||
|
f.write(wrapper.encode('utf-8'))
|
||||||
|
wrappers.append(page)
|
||||||
|
return wrappers
|
||||||
|
|
||||||
|
def create_viewer_wrapper(self, pages):
|
||||||
|
from calibre.ebooks.oeb.base import XHTML_NS
|
||||||
|
|
||||||
|
def page(src):
|
||||||
|
return '<img src="{}"></img>'.format(os.path.basename(src))
|
||||||
|
|
||||||
|
pages = '\n'.join(map(page, pages))
|
||||||
|
base = os.path.dirname(pages[0])
|
||||||
|
wrapper = '''
|
||||||
|
<html xmlns="%s">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8"/>
|
||||||
|
<style type="text/css">
|
||||||
|
html, body, img { height: 100vh; display: block; margin: 0; padding: 0; border-width: 0; }
|
||||||
|
img {
|
||||||
|
width: 100%%; height: 100%%;
|
||||||
|
object-fit: contain;
|
||||||
|
margin-left: auto; margin-right: auto;
|
||||||
|
max-width: 100vw; max-height: 100vh;
|
||||||
|
top: 50vh; transform: translateY(-50%%);
|
||||||
|
position: relative;
|
||||||
|
page-break-after: always;
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
%s
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
''' % (XHTML_NS, pages)
|
||||||
|
path = os.path.join(base, 'wrapper.xhtml')
|
||||||
|
with open(path, 'wb') as f:
|
||||||
|
f.write(wrapper.encode('utf-8'))
|
||||||
|
return path
|
||||||
67
ebook_converter/ebooks/conversion/plugins/djvu_input.py
Normal file
67
ebook_converter/ebooks/conversion/plugins/djvu_input.py
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2011, Anthon van der Neut <anthon@mnt.org>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class DJVUInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'DJVU Input'
|
||||||
|
author = 'Anthon van der Neut'
|
||||||
|
description = 'Convert OCR-ed DJVU files (.djvu) to HTML'
|
||||||
|
file_types = {'djvu', 'djv'}
|
||||||
|
commit_name = 'djvu_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log, accelerators):
|
||||||
|
from calibre.ebooks.txt.processor import convert_basic
|
||||||
|
|
||||||
|
stdout = BytesIO()
|
||||||
|
from calibre.ebooks.djvu.djvu import DJVUFile
|
||||||
|
x = DJVUFile(stream)
|
||||||
|
x.get_text(stdout)
|
||||||
|
raw_text = stdout.getvalue()
|
||||||
|
if not raw_text:
|
||||||
|
raise ValueError('The DJVU file contains no text, only images, probably page scans.'
|
||||||
|
' calibre only supports conversion of DJVU files with actual text in them.')
|
||||||
|
|
||||||
|
html = convert_basic(raw_text.replace(b"\n", b' ').replace(
|
||||||
|
b'\037', b'\n\n'))
|
||||||
|
# Run the HTMLized text through the html processing plugin.
|
||||||
|
from calibre.customize.ui import plugin_for_input_format
|
||||||
|
html_input = plugin_for_input_format('html')
|
||||||
|
for opt in html_input.options:
|
||||||
|
setattr(options, opt.option.name, opt.recommended_value)
|
||||||
|
options.input_encoding = 'utf-8'
|
||||||
|
base = getcwd()
|
||||||
|
htmlfile = os.path.join(base, 'index.html')
|
||||||
|
c = 0
|
||||||
|
while os.path.exists(htmlfile):
|
||||||
|
c += 1
|
||||||
|
htmlfile = os.path.join(base, 'index%d.html'%c)
|
||||||
|
with open(htmlfile, 'wb') as f:
|
||||||
|
f.write(html.encode('utf-8'))
|
||||||
|
odi = options.debug_pipeline
|
||||||
|
options.debug_pipeline = None
|
||||||
|
# Generate oeb from html conversion.
|
||||||
|
with open(htmlfile, 'rb') as f:
|
||||||
|
oeb = html_input.convert(f, options, 'html', log,
|
||||||
|
{})
|
||||||
|
options.debug_pipeline = odi
|
||||||
|
os.remove(htmlfile)
|
||||||
|
|
||||||
|
# Set metadata from file.
|
||||||
|
from calibre.customize.ui import get_file_type_metadata
|
||||||
|
from calibre.ebooks.oeb.transforms.metadata import meta_info_to_oeb_metadata
|
||||||
|
mi = get_file_type_metadata(stream, file_ext)
|
||||||
|
meta_info_to_oeb_metadata(mi, oeb.metadata, log)
|
||||||
|
|
||||||
|
return oeb
|
||||||
34
ebook_converter/ebooks/conversion/plugins/docx_input.py
Normal file
34
ebook_converter/ebooks/conversion/plugins/docx_input.py
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
|
||||||
|
|
||||||
|
class DOCXInput(InputFormatPlugin):
|
||||||
|
name = 'DOCX Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = _('Convert DOCX files (.docx and .docm) to HTML')
|
||||||
|
file_types = {'docx', 'docm'}
|
||||||
|
commit_name = 'docx_input'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='docx_no_cover', recommended_value=False,
|
||||||
|
help=_('Normally, if a large image is present at the start of the document that looks like a cover, '
|
||||||
|
'it will be removed from the document and used as the cover for created e-book. This option '
|
||||||
|
'turns off that behavior.')),
|
||||||
|
OptionRecommendation(name='docx_no_pagebreaks_between_notes', recommended_value=False,
|
||||||
|
help=_('Do not insert a page break after every endnote.')),
|
||||||
|
OptionRecommendation(name='docx_inline_subsup', recommended_value=False,
|
||||||
|
help=_('Render superscripts and subscripts so that they do not affect the line height.')),
|
||||||
|
}
|
||||||
|
|
||||||
|
recommendations = {('page_breaks_before', '/', OptionRecommendation.MED)}
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log, accelerators):
|
||||||
|
from calibre.ebooks.docx.to_html import Convert
|
||||||
|
return Convert(stream, detect_cover=not options.docx_no_cover, log=log, notes_nopb=options.docx_no_pagebreaks_between_notes,
|
||||||
|
nosupsub=options.docx_inline_subsup)()
|
||||||
93
ebook_converter/ebooks/conversion/plugins/docx_output.py
Normal file
93
ebook_converter/ebooks/conversion/plugins/docx_output.py
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, OptionRecommendation
|
||||||
|
|
||||||
|
PAGE_SIZES = ['a0', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'b0', 'b1',
|
||||||
|
'b2', 'b3', 'b4', 'b5', 'b6', 'legal', 'letter']
|
||||||
|
|
||||||
|
|
||||||
|
class DOCXOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'DOCX Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'docx'
|
||||||
|
commit_name = 'docx_output'
|
||||||
|
ui_data = {'page_sizes': PAGE_SIZES}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='docx_page_size', recommended_value='letter',
|
||||||
|
level=OptionRecommendation.LOW, choices=PAGE_SIZES,
|
||||||
|
help=_('The size of the page. Default is letter. Choices '
|
||||||
|
'are %s') % PAGE_SIZES),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_custom_page_size', recommended_value=None,
|
||||||
|
help=_('Custom size of the document. Use the form widthxheight '
|
||||||
|
'EG. `123x321` to specify the width and height (in pts). '
|
||||||
|
'This overrides any specified page-size.')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_no_cover', recommended_value=False,
|
||||||
|
help=_('Do not insert the book cover as an image at the start of the document.'
|
||||||
|
' If you use this option, the book cover will be discarded.')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='preserve_cover_aspect_ratio', recommended_value=False,
|
||||||
|
help=_('Preserve the aspect ratio of the cover image instead of stretching'
|
||||||
|
' it out to cover the entire page.')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_no_toc', recommended_value=False,
|
||||||
|
help=_('Do not insert the table of contents as a page at the start of the document.')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='extract_to',
|
||||||
|
help=_('Extract the contents of the generated %s file to the '
|
||||||
|
'specified directory. The contents of the directory are first '
|
||||||
|
'deleted, so be careful.') % 'DOCX'),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_page_margin_left', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the left page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common left page margin setting.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_page_margin_top', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the top page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common top page margin setting, unless set to zero.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_page_margin_right', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the right page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common right page margin setting, unless set to zero.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='docx_page_margin_bottom', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the bottom page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common bottom page margin setting, unless set to zero.')
|
||||||
|
),
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert_metadata(self, oeb):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.ebooks.oeb.base import OPF, OPF2_NS
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPF as ReadOPF
|
||||||
|
from io import BytesIO
|
||||||
|
package = etree.Element(OPF('package'), attrib={'version': '2.0'}, nsmap={None: OPF2_NS})
|
||||||
|
oeb.metadata.to_opf2(package)
|
||||||
|
self.mi = ReadOPF(BytesIO(etree.tostring(package, encoding='utf-8')), populate_spine=False, try_to_guess_cover=False).to_book_metadata()
|
||||||
|
|
||||||
|
def convert(self, oeb, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.docx.writer.container import DOCX
|
||||||
|
from calibre.ebooks.docx.writer.from_html import Convert
|
||||||
|
docx = DOCX(opts, log)
|
||||||
|
self.convert_metadata(oeb)
|
||||||
|
Convert(oeb, docx, self.mi, not opts.docx_no_cover, not opts.docx_no_toc)()
|
||||||
|
docx.write(output_path, self.mi)
|
||||||
|
if opts.extract_to:
|
||||||
|
from calibre.ebooks.docx.dump import do_dump
|
||||||
|
do_dump(output_path, opts.extract_to)
|
||||||
438
ebook_converter/ebooks/conversion/plugins/epub_input.py
Normal file
438
ebook_converter/ebooks/conversion/plugins/epub_input.py
Normal file
@@ -0,0 +1,438 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, re, posixpath
|
||||||
|
from itertools import cycle
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
ADOBE_OBFUSCATION = 'http://ns.adobe.com/pdf/enc#RC'
|
||||||
|
IDPF_OBFUSCATION = 'http://www.idpf.org/2008/embedding'
|
||||||
|
|
||||||
|
|
||||||
|
def decrypt_font_data(key, data, algorithm):
|
||||||
|
is_adobe = algorithm == ADOBE_OBFUSCATION
|
||||||
|
crypt_len = 1024 if is_adobe else 1040
|
||||||
|
crypt = bytearray(data[:crypt_len])
|
||||||
|
key = cycle(iter(bytearray(key)))
|
||||||
|
decrypt = bytes(bytearray(x^next(key) for x in crypt))
|
||||||
|
return decrypt + data[crypt_len:]
|
||||||
|
|
||||||
|
|
||||||
|
def decrypt_font(key, path, algorithm):
|
||||||
|
with lopen(path, 'r+b') as f:
|
||||||
|
data = decrypt_font_data(key, f.read(), algorithm)
|
||||||
|
f.seek(0), f.truncate(), f.write(data)
|
||||||
|
|
||||||
|
|
||||||
|
class EPUBInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'EPUB Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Convert EPUB files (.epub) to HTML'
|
||||||
|
file_types = {'epub'}
|
||||||
|
output_encoding = None
|
||||||
|
commit_name = 'epub_input'
|
||||||
|
|
||||||
|
recommendations = {('page_breaks_before', '/', OptionRecommendation.MED)}
|
||||||
|
|
||||||
|
def process_encryption(self, encfile, opf, log):
|
||||||
|
from lxml import etree
|
||||||
|
import uuid, hashlib
|
||||||
|
idpf_key = opf.raw_unique_identifier
|
||||||
|
if idpf_key:
|
||||||
|
idpf_key = re.sub('[\u0020\u0009\u000d\u000a]', '', idpf_key)
|
||||||
|
idpf_key = hashlib.sha1(idpf_key.encode('utf-8')).digest()
|
||||||
|
key = None
|
||||||
|
for item in opf.identifier_iter():
|
||||||
|
scheme = None
|
||||||
|
for xkey in item.attrib.keys():
|
||||||
|
if xkey.endswith('scheme'):
|
||||||
|
scheme = item.get(xkey)
|
||||||
|
if (scheme and scheme.lower() == 'uuid') or \
|
||||||
|
(item.text and item.text.startswith('urn:uuid:')):
|
||||||
|
try:
|
||||||
|
key = item.text.rpartition(':')[-1]
|
||||||
|
key = uuid.UUID(key).bytes
|
||||||
|
except:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
key = None
|
||||||
|
|
||||||
|
try:
|
||||||
|
root = etree.parse(encfile)
|
||||||
|
for em in root.xpath('descendant::*[contains(name(), "EncryptionMethod")]'):
|
||||||
|
algorithm = em.get('Algorithm', '')
|
||||||
|
if algorithm not in {ADOBE_OBFUSCATION, IDPF_OBFUSCATION}:
|
||||||
|
return False
|
||||||
|
cr = em.getparent().xpath('descendant::*[contains(name(), "CipherReference")]')[0]
|
||||||
|
uri = cr.get('URI')
|
||||||
|
path = os.path.abspath(os.path.join(os.path.dirname(encfile), '..', *uri.split('/')))
|
||||||
|
tkey = (key if algorithm == ADOBE_OBFUSCATION else idpf_key)
|
||||||
|
if (tkey and os.path.exists(path)):
|
||||||
|
self._encrypted_font_uris.append(uri)
|
||||||
|
decrypt_font(tkey, path, algorithm)
|
||||||
|
return True
|
||||||
|
except:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
return False
|
||||||
|
|
||||||
|
def set_guide_type(self, opf, gtype, href=None, title=''):
|
||||||
|
# Set the specified guide entry
|
||||||
|
for elem in list(opf.iterguide()):
|
||||||
|
if elem.get('type', '').lower() == gtype:
|
||||||
|
elem.getparent().remove(elem)
|
||||||
|
|
||||||
|
if href is not None:
|
||||||
|
t = opf.create_guide_item(gtype, title, href)
|
||||||
|
for guide in opf.root.xpath('./*[local-name()="guide"]'):
|
||||||
|
guide.append(t)
|
||||||
|
return
|
||||||
|
guide = opf.create_guide_element()
|
||||||
|
opf.root.append(guide)
|
||||||
|
guide.append(t)
|
||||||
|
return t
|
||||||
|
|
||||||
|
def rationalize_cover3(self, opf, log):
|
||||||
|
''' If there is a reference to the cover/titlepage via manifest properties, convert to
|
||||||
|
entries in the <guide> so that the rest of the pipeline picks it up. '''
|
||||||
|
from calibre.ebooks.metadata.opf3 import items_with_property
|
||||||
|
removed = guide_titlepage_href = guide_titlepage_id = None
|
||||||
|
|
||||||
|
# Look for titlepages incorrectly marked in the <guide> as covers
|
||||||
|
guide_cover, guide_elem = None, None
|
||||||
|
for guide_elem in opf.iterguide():
|
||||||
|
if guide_elem.get('type', '').lower() == 'cover':
|
||||||
|
guide_cover = guide_elem.get('href', '').partition('#')[0]
|
||||||
|
break
|
||||||
|
if guide_cover:
|
||||||
|
spine = list(opf.iterspine())
|
||||||
|
if spine:
|
||||||
|
idref = spine[0].get('idref', '')
|
||||||
|
for x in opf.itermanifest():
|
||||||
|
if x.get('id') == idref and x.get('href') == guide_cover:
|
||||||
|
guide_titlepage_href = guide_cover
|
||||||
|
guide_titlepage_id = idref
|
||||||
|
break
|
||||||
|
|
||||||
|
raster_cover_href = opf.epub3_raster_cover or opf.raster_cover
|
||||||
|
if raster_cover_href:
|
||||||
|
self.set_guide_type(opf, 'cover', raster_cover_href, 'Cover Image')
|
||||||
|
titlepage_id = titlepage_href = None
|
||||||
|
for item in items_with_property(opf.root, 'calibre:title-page'):
|
||||||
|
tid, href = item.get('id'), item.get('href')
|
||||||
|
if href and tid:
|
||||||
|
titlepage_id, titlepage_href = tid, href.partition('#')[0]
|
||||||
|
break
|
||||||
|
if titlepage_href is None:
|
||||||
|
titlepage_href, titlepage_id = guide_titlepage_href, guide_titlepage_id
|
||||||
|
if titlepage_href is not None:
|
||||||
|
self.set_guide_type(opf, 'titlepage', titlepage_href, 'Title Page')
|
||||||
|
spine = list(opf.iterspine())
|
||||||
|
if len(spine) > 1:
|
||||||
|
for item in spine:
|
||||||
|
if item.get('idref') == titlepage_id:
|
||||||
|
log('Found HTML cover', titlepage_href)
|
||||||
|
if self.for_viewer:
|
||||||
|
item.attrib.pop('linear', None)
|
||||||
|
else:
|
||||||
|
item.getparent().remove(item)
|
||||||
|
removed = titlepage_href
|
||||||
|
return removed
|
||||||
|
|
||||||
|
def rationalize_cover2(self, opf, log):
|
||||||
|
''' Ensure that the cover information in the guide is correct. That
|
||||||
|
means, at most one entry with type="cover" that points to a raster
|
||||||
|
cover and at most one entry with type="titlepage" that points to an
|
||||||
|
HTML titlepage. '''
|
||||||
|
from calibre.ebooks.oeb.base import OPF
|
||||||
|
removed = None
|
||||||
|
from lxml import etree
|
||||||
|
guide_cover, guide_elem = None, None
|
||||||
|
for guide_elem in opf.iterguide():
|
||||||
|
if guide_elem.get('type', '').lower() == 'cover':
|
||||||
|
guide_cover = guide_elem.get('href', '').partition('#')[0]
|
||||||
|
break
|
||||||
|
if not guide_cover:
|
||||||
|
raster_cover = opf.raster_cover
|
||||||
|
if raster_cover:
|
||||||
|
if guide_elem is None:
|
||||||
|
g = opf.root.makeelement(OPF('guide'))
|
||||||
|
opf.root.append(g)
|
||||||
|
else:
|
||||||
|
g = guide_elem.getparent()
|
||||||
|
guide_cover = raster_cover
|
||||||
|
guide_elem = g.makeelement(OPF('reference'), attrib={'href':raster_cover, 'type':'cover'})
|
||||||
|
g.append(guide_elem)
|
||||||
|
return
|
||||||
|
spine = list(opf.iterspine())
|
||||||
|
if not spine:
|
||||||
|
return
|
||||||
|
# Check if the cover specified in the guide is also
|
||||||
|
# the first element in spine
|
||||||
|
idref = spine[0].get('idref', '')
|
||||||
|
manifest = list(opf.itermanifest())
|
||||||
|
if not manifest:
|
||||||
|
return
|
||||||
|
elem = [x for x in manifest if x.get('id', '') == idref]
|
||||||
|
if not elem or elem[0].get('href', None) != guide_cover:
|
||||||
|
return
|
||||||
|
log('Found HTML cover', guide_cover)
|
||||||
|
|
||||||
|
# Remove from spine as covers must be treated
|
||||||
|
# specially
|
||||||
|
if not self.for_viewer:
|
||||||
|
if len(spine) == 1:
|
||||||
|
log.warn('There is only a single spine item and it is marked as the cover. Removing cover marking.')
|
||||||
|
for guide_elem in tuple(opf.iterguide()):
|
||||||
|
if guide_elem.get('type', '').lower() == 'cover':
|
||||||
|
guide_elem.getparent().remove(guide_elem)
|
||||||
|
return
|
||||||
|
else:
|
||||||
|
spine[0].getparent().remove(spine[0])
|
||||||
|
removed = guide_cover
|
||||||
|
else:
|
||||||
|
# Ensure the cover is displayed as the first item in the book, some
|
||||||
|
# epub files have it set with linear='no' which causes the cover to
|
||||||
|
# display in the end
|
||||||
|
spine[0].attrib.pop('linear', None)
|
||||||
|
opf.spine[0].is_linear = True
|
||||||
|
# Ensure that the guide has a cover entry pointing to a raster cover
|
||||||
|
# and a titlepage entry pointing to the html titlepage. The titlepage
|
||||||
|
# entry will be used by the epub output plugin, the raster cover entry
|
||||||
|
# by other output plugins.
|
||||||
|
|
||||||
|
# Search for a raster cover identified in the OPF
|
||||||
|
raster_cover = opf.raster_cover
|
||||||
|
|
||||||
|
# Set the cover guide entry
|
||||||
|
if raster_cover is not None:
|
||||||
|
guide_elem.set('href', raster_cover)
|
||||||
|
else:
|
||||||
|
# Render the titlepage to create a raster cover
|
||||||
|
from calibre.ebooks import render_html_svg_workaround
|
||||||
|
guide_elem.set('href', 'calibre_raster_cover.jpg')
|
||||||
|
t = etree.SubElement(
|
||||||
|
elem[0].getparent(), OPF('item'), href=guide_elem.get('href'), id='calibre_raster_cover')
|
||||||
|
t.set('media-type', 'image/jpeg')
|
||||||
|
if os.path.exists(guide_cover):
|
||||||
|
renderer = render_html_svg_workaround(guide_cover, log)
|
||||||
|
if renderer is not None:
|
||||||
|
with lopen('calibre_raster_cover.jpg', 'wb') as f:
|
||||||
|
f.write(renderer)
|
||||||
|
|
||||||
|
# Set the titlepage guide entry
|
||||||
|
self.set_guide_type(opf, 'titlepage', guide_cover, 'Title Page')
|
||||||
|
return removed
|
||||||
|
|
||||||
|
def find_opf(self):
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
|
||||||
|
def attr(n, attr):
|
||||||
|
for k, v in n.attrib.items():
|
||||||
|
if k.endswith(attr):
|
||||||
|
return v
|
||||||
|
try:
|
||||||
|
with lopen('META-INF/container.xml', 'rb') as f:
|
||||||
|
root = safe_xml_fromstring(f.read())
|
||||||
|
for r in root.xpath('//*[local-name()="rootfile"]'):
|
||||||
|
if attr(r, 'media-type') != "application/oebps-package+xml":
|
||||||
|
continue
|
||||||
|
path = attr(r, 'full-path')
|
||||||
|
if not path:
|
||||||
|
continue
|
||||||
|
path = os.path.join(getcwd(), *path.split('/'))
|
||||||
|
if os.path.exists(path):
|
||||||
|
return path
|
||||||
|
except Exception:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log, accelerators):
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
from calibre import walk
|
||||||
|
from calibre.ebooks import DRMError
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPF
|
||||||
|
try:
|
||||||
|
zf = ZipFile(stream)
|
||||||
|
zf.extractall(getcwd())
|
||||||
|
except:
|
||||||
|
log.exception('EPUB appears to be invalid ZIP file, trying a'
|
||||||
|
' more forgiving ZIP parser')
|
||||||
|
from calibre.utils.localunzip import extractall
|
||||||
|
stream.seek(0)
|
||||||
|
extractall(stream)
|
||||||
|
encfile = os.path.abspath(os.path.join('META-INF', 'encryption.xml'))
|
||||||
|
opf = self.find_opf()
|
||||||
|
if opf is None:
|
||||||
|
for f in walk('.'):
|
||||||
|
if f.lower().endswith('.opf') and '__MACOSX' not in f and \
|
||||||
|
not os.path.basename(f).startswith('.'):
|
||||||
|
opf = os.path.abspath(f)
|
||||||
|
break
|
||||||
|
path = getattr(stream, 'name', 'stream')
|
||||||
|
|
||||||
|
if opf is None:
|
||||||
|
raise ValueError('%s is not a valid EPUB file (could not find opf)'%path)
|
||||||
|
|
||||||
|
opf = os.path.relpath(opf, getcwd())
|
||||||
|
parts = os.path.split(opf)
|
||||||
|
opf = OPF(opf, os.path.dirname(os.path.abspath(opf)))
|
||||||
|
|
||||||
|
self._encrypted_font_uris = []
|
||||||
|
if os.path.exists(encfile):
|
||||||
|
if not self.process_encryption(encfile, opf, log):
|
||||||
|
raise DRMError(os.path.basename(path))
|
||||||
|
self.encrypted_fonts = self._encrypted_font_uris
|
||||||
|
|
||||||
|
if len(parts) > 1 and parts[0]:
|
||||||
|
delta = '/'.join(parts[:-1])+'/'
|
||||||
|
|
||||||
|
def normpath(x):
|
||||||
|
return posixpath.normpath(delta + elem.get('href'))
|
||||||
|
|
||||||
|
for elem in opf.itermanifest():
|
||||||
|
elem.set('href', normpath(elem.get('href')))
|
||||||
|
for elem in opf.iterguide():
|
||||||
|
elem.set('href', normpath(elem.get('href')))
|
||||||
|
|
||||||
|
f = self.rationalize_cover3 if opf.package_version >= 3.0 else self.rationalize_cover2
|
||||||
|
self.removed_cover = f(opf, log)
|
||||||
|
if self.removed_cover:
|
||||||
|
self.removed_items_to_ignore = (self.removed_cover,)
|
||||||
|
epub3_nav = opf.epub3_nav
|
||||||
|
if epub3_nav is not None:
|
||||||
|
self.convert_epub3_nav(epub3_nav, opf, log, options)
|
||||||
|
|
||||||
|
for x in opf.itermanifest():
|
||||||
|
if x.get('media-type', '') == 'application/x-dtbook+xml':
|
||||||
|
raise ValueError(
|
||||||
|
'EPUB files with DTBook markup are not supported')
|
||||||
|
|
||||||
|
not_for_spine = set()
|
||||||
|
for y in opf.itermanifest():
|
||||||
|
id_ = y.get('id', None)
|
||||||
|
if id_:
|
||||||
|
mt = y.get('media-type', None)
|
||||||
|
if mt in {
|
||||||
|
'application/vnd.adobe-page-template+xml',
|
||||||
|
'application/vnd.adobe.page-template+xml',
|
||||||
|
'application/adobe-page-template+xml',
|
||||||
|
'application/adobe.page-template+xml',
|
||||||
|
'application/text'
|
||||||
|
}:
|
||||||
|
not_for_spine.add(id_)
|
||||||
|
ext = y.get('href', '').rpartition('.')[-1].lower()
|
||||||
|
if mt == 'text/plain' and ext in {'otf', 'ttf'}:
|
||||||
|
# some epub authoring software sets font mime types to
|
||||||
|
# text/plain
|
||||||
|
not_for_spine.add(id_)
|
||||||
|
y.set('media-type', 'application/font')
|
||||||
|
|
||||||
|
seen = set()
|
||||||
|
for x in list(opf.iterspine()):
|
||||||
|
ref = x.get('idref', None)
|
||||||
|
if not ref or ref in not_for_spine or ref in seen:
|
||||||
|
x.getparent().remove(x)
|
||||||
|
continue
|
||||||
|
seen.add(ref)
|
||||||
|
|
||||||
|
if len(list(opf.iterspine())) == 0:
|
||||||
|
raise ValueError('No valid entries in the spine of this EPUB')
|
||||||
|
|
||||||
|
with lopen('content.opf', 'wb') as nopf:
|
||||||
|
nopf.write(opf.render())
|
||||||
|
|
||||||
|
return os.path.abspath('content.opf')
|
||||||
|
|
||||||
|
def convert_epub3_nav(self, nav_path, opf, log, opts):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
|
from calibre.ebooks.oeb.polish.parsing import parse
|
||||||
|
from calibre.ebooks.oeb.base import EPUB_NS, XHTML, NCX_MIME, NCX, urlnormalize, urlunquote, serialize
|
||||||
|
from calibre.ebooks.oeb.polish.toc import first_child
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
from tempfile import NamedTemporaryFile
|
||||||
|
with lopen(nav_path, 'rb') as f:
|
||||||
|
raw = f.read()
|
||||||
|
raw = xml_to_unicode(raw, strip_encoding_pats=True, assume_utf8=True)[0]
|
||||||
|
root = parse(raw, log=log)
|
||||||
|
ncx = safe_xml_fromstring('<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="eng"><navMap/></ncx>')
|
||||||
|
navmap = ncx[0]
|
||||||
|
et = '{%s}type' % EPUB_NS
|
||||||
|
bn = os.path.basename(nav_path)
|
||||||
|
|
||||||
|
def add_from_li(li, parent):
|
||||||
|
href = text = None
|
||||||
|
for x in li.iterchildren(XHTML('a'), XHTML('span')):
|
||||||
|
text = etree.tostring(
|
||||||
|
x, method='text', encoding='unicode', with_tail=False).strip() or ' '.join(
|
||||||
|
x.xpath('descendant-or-self::*/@title')).strip()
|
||||||
|
href = x.get('href')
|
||||||
|
if href:
|
||||||
|
if href.startswith('#'):
|
||||||
|
href = bn + href
|
||||||
|
break
|
||||||
|
np = parent.makeelement(NCX('navPoint'))
|
||||||
|
parent.append(np)
|
||||||
|
np.append(np.makeelement(NCX('navLabel')))
|
||||||
|
np[0].append(np.makeelement(NCX('text')))
|
||||||
|
np[0][0].text = text
|
||||||
|
if href:
|
||||||
|
np.append(np.makeelement(NCX('content'), attrib={'src':href}))
|
||||||
|
return np
|
||||||
|
|
||||||
|
def process_nav_node(node, toc_parent):
|
||||||
|
for li in node.iterchildren(XHTML('li')):
|
||||||
|
child = add_from_li(li, toc_parent)
|
||||||
|
ol = first_child(li, XHTML('ol'))
|
||||||
|
if child is not None and ol is not None:
|
||||||
|
process_nav_node(ol, child)
|
||||||
|
|
||||||
|
for nav in root.iterdescendants(XHTML('nav')):
|
||||||
|
if nav.get(et) == 'toc':
|
||||||
|
ol = first_child(nav, XHTML('ol'))
|
||||||
|
if ol is not None:
|
||||||
|
process_nav_node(ol, navmap)
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
return
|
||||||
|
|
||||||
|
with NamedTemporaryFile(suffix='.ncx', dir=os.path.dirname(nav_path), delete=False) as f:
|
||||||
|
f.write(etree.tostring(ncx, encoding='utf-8'))
|
||||||
|
ncx_href = os.path.relpath(f.name, getcwd()).replace(os.sep, '/')
|
||||||
|
ncx_id = opf.create_manifest_item(ncx_href, NCX_MIME, append=True).get('id')
|
||||||
|
for spine in opf.root.xpath('//*[local-name()="spine"]'):
|
||||||
|
spine.set('toc', ncx_id)
|
||||||
|
opts.epub3_nav_href = urlnormalize(os.path.relpath(nav_path).replace(os.sep, '/'))
|
||||||
|
opts.epub3_nav_parsed = root
|
||||||
|
if getattr(self, 'removed_cover', None):
|
||||||
|
changed = False
|
||||||
|
base_path = os.path.dirname(nav_path)
|
||||||
|
for elem in root.xpath('//*[@href]'):
|
||||||
|
href, frag = elem.get('href').partition('#')[::2]
|
||||||
|
link_path = os.path.relpath(os.path.join(base_path, urlunquote(href)), base_path)
|
||||||
|
abs_href = urlnormalize(link_path)
|
||||||
|
if abs_href == self.removed_cover:
|
||||||
|
changed = True
|
||||||
|
elem.set('data-calibre-removed-titlepage', '1')
|
||||||
|
if changed:
|
||||||
|
with lopen(nav_path, 'wb') as f:
|
||||||
|
f.write(serialize(root, 'application/xhtml+xml'))
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
rc = getattr(self, 'removed_cover', None)
|
||||||
|
if rc:
|
||||||
|
cover_toc_item = None
|
||||||
|
for item in oeb.toc.iterdescendants():
|
||||||
|
if item.href and item.href.partition('#')[0] == rc:
|
||||||
|
cover_toc_item = item
|
||||||
|
break
|
||||||
|
spine = {x.href for x in oeb.spine}
|
||||||
|
if (cover_toc_item is not None and cover_toc_item not in spine):
|
||||||
|
oeb.toc.item_that_refers_to_cover = cover_toc_item
|
||||||
548
ebook_converter/ebooks/conversion/plugins/epub_output.py
Normal file
548
ebook_converter/ebooks/conversion/plugins/epub_output.py
Normal file
@@ -0,0 +1,548 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, shutil, re
|
||||||
|
|
||||||
|
from calibre.customize.conversion import (OutputFormatPlugin,
|
||||||
|
OptionRecommendation)
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from calibre import CurrentDir
|
||||||
|
from polyglot.builtins import unicode_type, filter, map, zip, range, as_bytes
|
||||||
|
|
||||||
|
block_level_tags = (
|
||||||
|
'address',
|
||||||
|
'body',
|
||||||
|
'blockquote',
|
||||||
|
'center',
|
||||||
|
'dir',
|
||||||
|
'div',
|
||||||
|
'dl',
|
||||||
|
'fieldset',
|
||||||
|
'form',
|
||||||
|
'h1',
|
||||||
|
'h2',
|
||||||
|
'h3',
|
||||||
|
'h4',
|
||||||
|
'h5',
|
||||||
|
'h6',
|
||||||
|
'hr',
|
||||||
|
'isindex',
|
||||||
|
'menu',
|
||||||
|
'noframes',
|
||||||
|
'noscript',
|
||||||
|
'ol',
|
||||||
|
'p',
|
||||||
|
'pre',
|
||||||
|
'table',
|
||||||
|
'ul',
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class EPUBOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'EPUB Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'epub'
|
||||||
|
commit_name = 'epub_output'
|
||||||
|
ui_data = {'versions': ('2', '3')}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='extract_to',
|
||||||
|
help=_('Extract the contents of the generated %s file to the '
|
||||||
|
'specified directory. The contents of the directory are first '
|
||||||
|
'deleted, so be careful.') % 'EPUB'),
|
||||||
|
|
||||||
|
OptionRecommendation(name='dont_split_on_page_breaks',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Turn off splitting at page breaks. Normally, input '
|
||||||
|
'files are automatically split at every page break into '
|
||||||
|
'two files. This gives an output e-book that can be '
|
||||||
|
'parsed faster and with less resources. However, '
|
||||||
|
'splitting is slow and if your source file contains a '
|
||||||
|
'very large number of page breaks, you should turn off '
|
||||||
|
'splitting on page breaks.'
|
||||||
|
)
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='flow_size', recommended_value=260,
|
||||||
|
help=_('Split all HTML files larger than this size (in KB). '
|
||||||
|
'This is necessary as most EPUB readers cannot handle large '
|
||||||
|
'file sizes. The default of %defaultKB is the size required '
|
||||||
|
'for Adobe Digital Editions. Set to 0 to disable size based splitting.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='no_default_epub_cover', recommended_value=False,
|
||||||
|
help=_('Normally, if the input file has no cover and you don\'t'
|
||||||
|
' specify one, a default cover is generated with the title, '
|
||||||
|
'authors, etc. This option disables the generation of this cover.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='no_svg_cover', recommended_value=False,
|
||||||
|
help=_('Do not use SVG for the book cover. Use this option if '
|
||||||
|
'your EPUB is going to be used on a device that does not '
|
||||||
|
'support SVG, like the iPhone or the JetBook Lite. '
|
||||||
|
'Without this option, such devices will display the cover '
|
||||||
|
'as a blank page.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='preserve_cover_aspect_ratio',
|
||||||
|
recommended_value=False, help=_(
|
||||||
|
'When using an SVG cover, this option will cause the cover to scale '
|
||||||
|
'to cover the available screen area, but still preserve its aspect ratio '
|
||||||
|
'(ratio of width to height). That means there may be white borders '
|
||||||
|
'at the sides or top and bottom of the image, but the image will '
|
||||||
|
'never be distorted. Without this option the image may be slightly '
|
||||||
|
'distorted, but there will be no borders.'
|
||||||
|
)
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='epub_flatten', recommended_value=False,
|
||||||
|
help=_('This option is needed only if you intend to use the EPUB'
|
||||||
|
' with FBReaderJ. It will flatten the file system inside the'
|
||||||
|
' EPUB, putting all files into the top level.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='epub_inline_toc', recommended_value=False,
|
||||||
|
help=_('Insert an inline Table of Contents that will appear as part of the main book content.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='epub_toc_at_end', recommended_value=False,
|
||||||
|
help=_('Put the inserted inline Table of Contents at the end of the book instead of the start.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='toc_title', recommended_value=None,
|
||||||
|
help=_('Title for any generated in-line table of contents.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='epub_version', recommended_value='2', choices=ui_data['versions'],
|
||||||
|
help=_('The version of the EPUB file to generate. EPUB 2 is the'
|
||||||
|
' most widely compatible, only use EPUB 3 if you know you'
|
||||||
|
' actually need it.')
|
||||||
|
),
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
recommendations = {('pretty_print', True, OptionRecommendation.HIGH)}
|
||||||
|
|
||||||
|
def workaround_webkit_quirks(self): # {{{
|
||||||
|
from calibre.ebooks.oeb.base import XPath
|
||||||
|
for x in self.oeb.spine:
|
||||||
|
root = x.data
|
||||||
|
body = XPath('//h:body')(root)
|
||||||
|
if body:
|
||||||
|
body = body[0]
|
||||||
|
|
||||||
|
if not hasattr(body, 'xpath'):
|
||||||
|
continue
|
||||||
|
|
||||||
|
for pre in XPath('//h:pre')(body):
|
||||||
|
if not pre.text and len(pre) == 0:
|
||||||
|
pre.tag = 'div'
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def upshift_markup(self): # {{{
|
||||||
|
'Upgrade markup to comply with XHTML 1.1 where possible'
|
||||||
|
from calibre.ebooks.oeb.base import XPath, XML
|
||||||
|
for x in self.oeb.spine:
|
||||||
|
root = x.data
|
||||||
|
if (not root.get(XML('lang'))) and (root.get('lang')):
|
||||||
|
root.set(XML('lang'), root.get('lang'))
|
||||||
|
body = XPath('//h:body')(root)
|
||||||
|
if body:
|
||||||
|
body = body[0]
|
||||||
|
|
||||||
|
if not hasattr(body, 'xpath'):
|
||||||
|
continue
|
||||||
|
for u in XPath('//h:u')(root):
|
||||||
|
u.tag = 'span'
|
||||||
|
|
||||||
|
seen_ids, seen_names = set(), set()
|
||||||
|
for x in XPath('//*[@id or @name]')(root):
|
||||||
|
eid, name = x.get('id', None), x.get('name', None)
|
||||||
|
if eid:
|
||||||
|
if eid in seen_ids:
|
||||||
|
del x.attrib['id']
|
||||||
|
else:
|
||||||
|
seen_ids.add(eid)
|
||||||
|
if name:
|
||||||
|
if name in seen_names:
|
||||||
|
del x.attrib['name']
|
||||||
|
else:
|
||||||
|
seen_names.add(name)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def convert(self, oeb, output_path, input_plugin, opts, log):
|
||||||
|
self.log, self.opts, self.oeb = log, opts, oeb
|
||||||
|
|
||||||
|
if self.opts.epub_inline_toc:
|
||||||
|
from calibre.ebooks.mobi.writer8.toc import TOCAdder
|
||||||
|
opts.mobi_toc_at_start = not opts.epub_toc_at_end
|
||||||
|
opts.mobi_passthrough = False
|
||||||
|
opts.no_inline_toc = False
|
||||||
|
TOCAdder(oeb, opts, replace_previous_inline_toc=True, ignore_existing_toc=True)
|
||||||
|
|
||||||
|
if self.opts.epub_flatten:
|
||||||
|
from calibre.ebooks.oeb.transforms.filenames import FlatFilenames
|
||||||
|
FlatFilenames()(oeb, opts)
|
||||||
|
else:
|
||||||
|
from calibre.ebooks.oeb.transforms.filenames import UniqueFilenames
|
||||||
|
UniqueFilenames()(oeb, opts)
|
||||||
|
|
||||||
|
self.workaround_ade_quirks()
|
||||||
|
self.workaround_webkit_quirks()
|
||||||
|
self.upshift_markup()
|
||||||
|
from calibre.ebooks.oeb.transforms.rescale import RescaleImages
|
||||||
|
RescaleImages(check_colorspaces=True)(oeb, opts)
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.transforms.split import Split
|
||||||
|
split = Split(not self.opts.dont_split_on_page_breaks,
|
||||||
|
max_flow_size=self.opts.flow_size*1024
|
||||||
|
)
|
||||||
|
split(self.oeb, self.opts)
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.transforms.cover import CoverManager
|
||||||
|
cm = CoverManager(
|
||||||
|
no_default_cover=self.opts.no_default_epub_cover,
|
||||||
|
no_svg_cover=self.opts.no_svg_cover,
|
||||||
|
preserve_aspect_ratio=self.opts.preserve_cover_aspect_ratio)
|
||||||
|
cm(self.oeb, self.opts, self.log)
|
||||||
|
|
||||||
|
self.workaround_sony_quirks()
|
||||||
|
|
||||||
|
if self.oeb.toc.count() == 0:
|
||||||
|
self.log.warn('This EPUB file has no Table of Contents. '
|
||||||
|
'Creating a default TOC')
|
||||||
|
first = next(iter(self.oeb.spine))
|
||||||
|
self.oeb.toc.add(_('Start'), first.href)
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.base import OPF
|
||||||
|
identifiers = oeb.metadata['identifier']
|
||||||
|
uuid = None
|
||||||
|
for x in identifiers:
|
||||||
|
if x.get(OPF('scheme'), None).lower() == 'uuid' or unicode_type(x).startswith('urn:uuid:'):
|
||||||
|
uuid = unicode_type(x).split(':')[-1]
|
||||||
|
break
|
||||||
|
encrypted_fonts = getattr(input_plugin, 'encrypted_fonts', [])
|
||||||
|
|
||||||
|
if uuid is None:
|
||||||
|
self.log.warn('No UUID identifier found')
|
||||||
|
from uuid import uuid4
|
||||||
|
uuid = unicode_type(uuid4())
|
||||||
|
oeb.metadata.add('identifier', uuid, scheme='uuid', id=uuid)
|
||||||
|
|
||||||
|
if encrypted_fonts and not uuid.startswith('urn:uuid:'):
|
||||||
|
# Apparently ADE requires this value to start with urn:uuid:
|
||||||
|
# for some absurd reason, or it will throw a hissy fit and refuse
|
||||||
|
# to use the obfuscated fonts.
|
||||||
|
for x in identifiers:
|
||||||
|
if unicode_type(x) == uuid:
|
||||||
|
x.content = 'urn:uuid:'+uuid
|
||||||
|
|
||||||
|
with TemporaryDirectory('_epub_output') as tdir:
|
||||||
|
from calibre.customize.ui import plugin_for_output_format
|
||||||
|
metadata_xml = None
|
||||||
|
extra_entries = []
|
||||||
|
if self.is_periodical:
|
||||||
|
if self.opts.output_profile.epub_periodical_format == 'sony':
|
||||||
|
from calibre.ebooks.epub.periodical import sony_metadata
|
||||||
|
metadata_xml, atom_xml = sony_metadata(oeb)
|
||||||
|
extra_entries = [('atom.xml', 'application/atom+xml', atom_xml)]
|
||||||
|
oeb_output = plugin_for_output_format('oeb')
|
||||||
|
oeb_output.convert(oeb, tdir, input_plugin, opts, log)
|
||||||
|
opf = [x for x in os.listdir(tdir) if x.endswith('.opf')][0]
|
||||||
|
self.condense_ncx([os.path.join(tdir, x) for x in os.listdir(tdir)
|
||||||
|
if x.endswith('.ncx')][0])
|
||||||
|
if self.opts.epub_version == '3':
|
||||||
|
self.upgrade_to_epub3(tdir, opf)
|
||||||
|
encryption = None
|
||||||
|
if encrypted_fonts:
|
||||||
|
encryption = self.encrypt_fonts(encrypted_fonts, tdir, uuid)
|
||||||
|
|
||||||
|
from calibre.ebooks.epub import initialize_container
|
||||||
|
with initialize_container(output_path, os.path.basename(opf),
|
||||||
|
extra_entries=extra_entries) as epub:
|
||||||
|
epub.add_dir(tdir)
|
||||||
|
if encryption is not None:
|
||||||
|
epub.writestr('META-INF/encryption.xml', as_bytes(encryption))
|
||||||
|
if metadata_xml is not None:
|
||||||
|
epub.writestr('META-INF/metadata.xml',
|
||||||
|
metadata_xml.encode('utf-8'))
|
||||||
|
if opts.extract_to is not None:
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
if os.path.exists(opts.extract_to):
|
||||||
|
if os.path.isdir(opts.extract_to):
|
||||||
|
shutil.rmtree(opts.extract_to)
|
||||||
|
else:
|
||||||
|
os.remove(opts.extract_to)
|
||||||
|
os.mkdir(opts.extract_to)
|
||||||
|
with ZipFile(output_path) as zf:
|
||||||
|
zf.extractall(path=opts.extract_to)
|
||||||
|
self.log.info('EPUB extracted to', opts.extract_to)
|
||||||
|
|
||||||
|
def upgrade_to_epub3(self, tdir, opf):
|
||||||
|
self.log.info('Upgrading to EPUB 3...')
|
||||||
|
from calibre.ebooks.epub import simple_container_xml
|
||||||
|
from calibre.ebooks.oeb.polish.cover import fix_conversion_titlepage_links_in_nav
|
||||||
|
try:
|
||||||
|
os.mkdir(os.path.join(tdir, 'META-INF'))
|
||||||
|
except EnvironmentError:
|
||||||
|
pass
|
||||||
|
with open(os.path.join(tdir, 'META-INF', 'container.xml'), 'wb') as f:
|
||||||
|
f.write(simple_container_xml(os.path.basename(opf)).encode('utf-8'))
|
||||||
|
from calibre.ebooks.oeb.polish.container import EpubContainer
|
||||||
|
container = EpubContainer(tdir, self.log)
|
||||||
|
from calibre.ebooks.oeb.polish.upgrade import epub_2_to_3
|
||||||
|
existing_nav = getattr(self.opts, 'epub3_nav_parsed', None)
|
||||||
|
nav_href = getattr(self.opts, 'epub3_nav_href', None)
|
||||||
|
previous_nav = (nav_href, existing_nav) if existing_nav and nav_href else None
|
||||||
|
epub_2_to_3(container, self.log.info, previous_nav=previous_nav)
|
||||||
|
fix_conversion_titlepage_links_in_nav(container)
|
||||||
|
container.commit()
|
||||||
|
os.remove(f.name)
|
||||||
|
try:
|
||||||
|
os.rmdir(os.path.join(tdir, 'META-INF'))
|
||||||
|
except EnvironmentError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
def encrypt_fonts(self, uris, tdir, uuid): # {{{
|
||||||
|
from polyglot.binary import from_hex_bytes
|
||||||
|
|
||||||
|
key = re.sub(r'[^a-fA-F0-9]', '', uuid)
|
||||||
|
if len(key) < 16:
|
||||||
|
raise ValueError('UUID identifier %r is invalid'%uuid)
|
||||||
|
key = bytearray(from_hex_bytes((key + key)[:32]))
|
||||||
|
paths = []
|
||||||
|
with CurrentDir(tdir):
|
||||||
|
paths = [os.path.join(*x.split('/')) for x in uris]
|
||||||
|
uris = dict(zip(uris, paths))
|
||||||
|
fonts = []
|
||||||
|
for uri in list(uris.keys()):
|
||||||
|
path = uris[uri]
|
||||||
|
if not os.path.exists(path):
|
||||||
|
uris.pop(uri)
|
||||||
|
continue
|
||||||
|
self.log.debug('Encrypting font:', uri)
|
||||||
|
with lopen(path, 'r+b') as f:
|
||||||
|
data = f.read(1024)
|
||||||
|
if len(data) >= 1024:
|
||||||
|
data = bytearray(data)
|
||||||
|
f.seek(0)
|
||||||
|
f.write(bytes(bytearray(data[i] ^ key[i%16] for i in range(1024))))
|
||||||
|
else:
|
||||||
|
self.log.warn('Font', path, 'is invalid, ignoring')
|
||||||
|
if not isinstance(uri, unicode_type):
|
||||||
|
uri = uri.decode('utf-8')
|
||||||
|
fonts.append('''
|
||||||
|
<enc:EncryptedData>
|
||||||
|
<enc:EncryptionMethod Algorithm="http://ns.adobe.com/pdf/enc#RC"/>
|
||||||
|
<enc:CipherData>
|
||||||
|
<enc:CipherReference URI="%s"/>
|
||||||
|
</enc:CipherData>
|
||||||
|
</enc:EncryptedData>
|
||||||
|
'''%(uri.replace('"', '\\"')))
|
||||||
|
if fonts:
|
||||||
|
ans = '''<encryption
|
||||||
|
xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
|
||||||
|
xmlns:enc="http://www.w3.org/2001/04/xmlenc#"
|
||||||
|
xmlns:deenc="http://ns.adobe.com/digitaleditions/enc">
|
||||||
|
'''
|
||||||
|
ans += '\n'.join(fonts)
|
||||||
|
ans += '\n</encryption>'
|
||||||
|
return ans
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def condense_ncx(self, ncx_path): # {{{
|
||||||
|
from lxml import etree
|
||||||
|
if not self.opts.pretty_print:
|
||||||
|
tree = etree.parse(ncx_path)
|
||||||
|
for tag in tree.getroot().iter(tag=etree.Element):
|
||||||
|
if tag.text:
|
||||||
|
tag.text = tag.text.strip()
|
||||||
|
if tag.tail:
|
||||||
|
tag.tail = tag.tail.strip()
|
||||||
|
compressed = etree.tostring(tree.getroot(), encoding='utf-8')
|
||||||
|
with open(ncx_path, 'wb') as f:
|
||||||
|
f.write(compressed)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def workaround_ade_quirks(self): # {{{
|
||||||
|
'''
|
||||||
|
Perform various markup transforms to get the output to render correctly
|
||||||
|
in the quirky ADE.
|
||||||
|
'''
|
||||||
|
from calibre.ebooks.oeb.base import XPath, XHTML, barename, urlunquote
|
||||||
|
|
||||||
|
stylesheet = self.oeb.manifest.main_stylesheet
|
||||||
|
|
||||||
|
# ADE cries big wet tears when it encounters an invalid fragment
|
||||||
|
# identifier in the NCX toc.
|
||||||
|
frag_pat = re.compile(r'[-A-Za-z0-9_:.]+$')
|
||||||
|
for node in self.oeb.toc.iter():
|
||||||
|
href = getattr(node, 'href', None)
|
||||||
|
if hasattr(href, 'partition'):
|
||||||
|
base, _, frag = href.partition('#')
|
||||||
|
frag = urlunquote(frag)
|
||||||
|
if frag and frag_pat.match(frag) is None:
|
||||||
|
self.log.warn(
|
||||||
|
'Removing fragment identifier %r from TOC as Adobe Digital Editions cannot handle it'%frag)
|
||||||
|
node.href = base
|
||||||
|
|
||||||
|
for x in self.oeb.spine:
|
||||||
|
root = x.data
|
||||||
|
body = XPath('//h:body')(root)
|
||||||
|
if body:
|
||||||
|
body = body[0]
|
||||||
|
|
||||||
|
if hasattr(body, 'xpath'):
|
||||||
|
# remove <img> tags with empty src elements
|
||||||
|
bad = []
|
||||||
|
for x in XPath('//h:img')(body):
|
||||||
|
src = x.get('src', '').strip()
|
||||||
|
if src in ('', '#') or src.startswith('http:'):
|
||||||
|
bad.append(x)
|
||||||
|
for img in bad:
|
||||||
|
img.getparent().remove(img)
|
||||||
|
|
||||||
|
# Add id attribute to <a> tags that have name
|
||||||
|
for x in XPath('//h:a[@name]')(body):
|
||||||
|
if not x.get('id', False):
|
||||||
|
x.set('id', x.get('name'))
|
||||||
|
# The delightful epubcheck has started complaining about <a> tags that
|
||||||
|
# have name attributes.
|
||||||
|
x.attrib.pop('name')
|
||||||
|
|
||||||
|
# Replace <br> that are children of <body> as ADE doesn't handle them
|
||||||
|
for br in XPath('./h:br')(body):
|
||||||
|
if br.getparent() is None:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
prior = next(br.itersiblings(preceding=True))
|
||||||
|
priortag = barename(prior.tag)
|
||||||
|
priortext = prior.tail
|
||||||
|
except:
|
||||||
|
priortag = 'body'
|
||||||
|
priortext = body.text
|
||||||
|
if priortext:
|
||||||
|
priortext = priortext.strip()
|
||||||
|
br.tag = XHTML('p')
|
||||||
|
br.text = '\u00a0'
|
||||||
|
style = br.get('style', '').split(';')
|
||||||
|
style = list(filter(None, map(lambda x: x.strip(), style)))
|
||||||
|
style.append('margin:0pt; border:0pt')
|
||||||
|
# If the prior tag is a block (including a <br> we replaced)
|
||||||
|
# then this <br> replacement should have a 1-line height.
|
||||||
|
# Otherwise it should have no height.
|
||||||
|
if not priortext and priortag in block_level_tags:
|
||||||
|
style.append('height:1em')
|
||||||
|
else:
|
||||||
|
style.append('height:0pt')
|
||||||
|
br.set('style', '; '.join(style))
|
||||||
|
|
||||||
|
for tag in XPath('//h:embed')(root):
|
||||||
|
tag.getparent().remove(tag)
|
||||||
|
for tag in XPath('//h:object')(root):
|
||||||
|
if tag.get('type', '').lower().strip() in {'image/svg+xml', 'application/svg+xml'}:
|
||||||
|
continue
|
||||||
|
tag.getparent().remove(tag)
|
||||||
|
|
||||||
|
for tag in XPath('//h:title|//h:style')(root):
|
||||||
|
if not tag.text:
|
||||||
|
tag.getparent().remove(tag)
|
||||||
|
for tag in XPath('//h:script')(root):
|
||||||
|
if (not tag.text and not tag.get('src', False) and tag.get('type', None) != 'text/x-mathjax-config'):
|
||||||
|
tag.getparent().remove(tag)
|
||||||
|
for tag in XPath('//h:body/descendant::h:script')(root):
|
||||||
|
tag.getparent().remove(tag)
|
||||||
|
|
||||||
|
formchildren = XPath('./h:input|./h:button|./h:textarea|'
|
||||||
|
'./h:label|./h:fieldset|./h:legend')
|
||||||
|
for tag in XPath('//h:form')(root):
|
||||||
|
if formchildren(tag):
|
||||||
|
tag.getparent().remove(tag)
|
||||||
|
else:
|
||||||
|
# Not a real form
|
||||||
|
tag.tag = XHTML('div')
|
||||||
|
|
||||||
|
for tag in XPath('//h:center')(root):
|
||||||
|
tag.tag = XHTML('div')
|
||||||
|
tag.set('style', 'text-align:center')
|
||||||
|
# ADE can't handle & in an img url
|
||||||
|
for tag in XPath('//h:img[@src]')(root):
|
||||||
|
tag.set('src', tag.get('src', '').replace('&', ''))
|
||||||
|
|
||||||
|
# ADE whimpers in fright when it encounters a <td> outside a
|
||||||
|
# <table>
|
||||||
|
in_table = XPath('ancestor::h:table')
|
||||||
|
for tag in XPath('//h:td|//h:tr|//h:th')(root):
|
||||||
|
if not in_table(tag):
|
||||||
|
tag.tag = XHTML('div')
|
||||||
|
|
||||||
|
# ADE fails to render non breaking hyphens/soft hyphens/zero width spaces
|
||||||
|
special_chars = re.compile('[\u200b\u00ad]')
|
||||||
|
for elem in root.iterdescendants('*'):
|
||||||
|
if elem.text:
|
||||||
|
elem.text = special_chars.sub('', elem.text)
|
||||||
|
elem.text = elem.text.replace('\u2011', '-')
|
||||||
|
if elem.tail:
|
||||||
|
elem.tail = special_chars.sub('', elem.tail)
|
||||||
|
elem.tail = elem.tail.replace('\u2011', '-')
|
||||||
|
|
||||||
|
if stylesheet is not None:
|
||||||
|
# ADE doesn't render lists correctly if they have left margins
|
||||||
|
from css_parser.css import CSSRule
|
||||||
|
for lb in XPath('//h:ul[@class]|//h:ol[@class]')(root):
|
||||||
|
sel = '.'+lb.get('class')
|
||||||
|
for rule in stylesheet.data.cssRules.rulesOfType(CSSRule.STYLE_RULE):
|
||||||
|
if sel == rule.selectorList.selectorText:
|
||||||
|
rule.style.removeProperty('margin-left')
|
||||||
|
# padding-left breaks rendering in webkit and gecko
|
||||||
|
rule.style.removeProperty('padding-left')
|
||||||
|
# Change whitespace:pre to pre-wrap to accommodate readers that
|
||||||
|
# cannot scroll horizontally
|
||||||
|
for rule in stylesheet.data.cssRules.rulesOfType(CSSRule.STYLE_RULE):
|
||||||
|
style = rule.style
|
||||||
|
ws = style.getPropertyValue('white-space')
|
||||||
|
if ws == 'pre':
|
||||||
|
style.setProperty('white-space', 'pre-wrap')
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def workaround_sony_quirks(self): # {{{
|
||||||
|
'''
|
||||||
|
Perform toc link transforms to alleviate slow loading.
|
||||||
|
'''
|
||||||
|
from calibre.ebooks.oeb.base import urldefrag, XPath
|
||||||
|
from calibre.ebooks.oeb.polish.toc import item_at_top
|
||||||
|
|
||||||
|
def frag_is_at_top(root, frag):
|
||||||
|
elem = XPath('//*[@id="%s" or @name="%s"]'%(frag, frag))(root)
|
||||||
|
if elem:
|
||||||
|
elem = elem[0]
|
||||||
|
else:
|
||||||
|
return False
|
||||||
|
return item_at_top(elem)
|
||||||
|
|
||||||
|
def simplify_toc_entry(toc):
|
||||||
|
if toc.href:
|
||||||
|
href, frag = urldefrag(toc.href)
|
||||||
|
if frag:
|
||||||
|
for x in self.oeb.spine:
|
||||||
|
if x.href == href:
|
||||||
|
if frag_is_at_top(x.data, frag):
|
||||||
|
self.log.debug('Removing anchor from TOC href:',
|
||||||
|
href+'#'+frag)
|
||||||
|
toc.href = href
|
||||||
|
break
|
||||||
|
for x in toc:
|
||||||
|
simplify_toc_entry(x)
|
||||||
|
|
||||||
|
if self.oeb.toc:
|
||||||
|
simplify_toc_entry(self.oeb.toc)
|
||||||
|
|
||||||
|
# }}}
|
||||||
179
ebook_converter/ebooks/conversion/plugins/fb2_input.py
Normal file
179
ebook_converter/ebooks/conversion/plugins/fb2_input.py
Normal file
@@ -0,0 +1,179 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Anatoly Shipitsin <norguhtar at gmail.com>'
|
||||||
|
"""
|
||||||
|
Convert .fb2 files to .lrf
|
||||||
|
"""
|
||||||
|
import os, re
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from calibre import guess_type
|
||||||
|
from polyglot.builtins import iteritems, getcwd
|
||||||
|
|
||||||
|
FB2NS = 'http://www.gribuser.ru/xml/fictionbook/2.0'
|
||||||
|
FB21NS = 'http://www.gribuser.ru/xml/fictionbook/2.1'
|
||||||
|
|
||||||
|
|
||||||
|
class FB2Input(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'FB2 Input'
|
||||||
|
author = 'Anatoly Shipitsin'
|
||||||
|
description = 'Convert FB2 and FBZ files to HTML'
|
||||||
|
file_types = {'fb2', 'fbz'}
|
||||||
|
commit_name = 'fb2_input'
|
||||||
|
|
||||||
|
recommendations = {
|
||||||
|
('level1_toc', '//h:h1', OptionRecommendation.MED),
|
||||||
|
('level2_toc', '//h:h2', OptionRecommendation.MED),
|
||||||
|
('level3_toc', '//h:h3', OptionRecommendation.MED),
|
||||||
|
}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='no_inline_fb2_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Do not insert a Table of Contents at the beginning of the book.'
|
||||||
|
)
|
||||||
|
)}
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
from calibre.ebooks.metadata.fb2 import ensure_namespace, get_fb2_data
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||||
|
from calibre.ebooks.metadata.meta import get_metadata
|
||||||
|
from calibre.ebooks.oeb.base import XLINK_NS, XHTML_NS
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
|
self.log = log
|
||||||
|
log.debug('Parsing XML...')
|
||||||
|
raw = get_fb2_data(stream)[0]
|
||||||
|
raw = raw.replace(b'\0', b'')
|
||||||
|
raw = xml_to_unicode(raw, strip_encoding_pats=True,
|
||||||
|
assume_utf8=True, resolve_entities=True)[0]
|
||||||
|
try:
|
||||||
|
doc = safe_xml_fromstring(raw)
|
||||||
|
except etree.XMLSyntaxError:
|
||||||
|
doc = safe_xml_fromstring(raw.replace('& ', '&'))
|
||||||
|
if doc is None:
|
||||||
|
raise ValueError('The FB2 file is not valid XML')
|
||||||
|
doc = ensure_namespace(doc)
|
||||||
|
try:
|
||||||
|
fb_ns = doc.nsmap[doc.prefix]
|
||||||
|
except Exception:
|
||||||
|
fb_ns = FB2NS
|
||||||
|
|
||||||
|
NAMESPACES = {'f':fb_ns, 'l':XLINK_NS}
|
||||||
|
stylesheets = doc.xpath('//*[local-name() = "stylesheet" and @type="text/css"]')
|
||||||
|
css = ''
|
||||||
|
for s in stylesheets:
|
||||||
|
css += etree.tostring(s, encoding='unicode', method='text',
|
||||||
|
with_tail=False) + '\n\n'
|
||||||
|
if css:
|
||||||
|
import css_parser, logging
|
||||||
|
parser = css_parser.CSSParser(fetcher=None,
|
||||||
|
log=logging.getLogger('calibre.css'))
|
||||||
|
|
||||||
|
XHTML_CSS_NAMESPACE = '@namespace "%s";\n' % XHTML_NS
|
||||||
|
text = XHTML_CSS_NAMESPACE + css
|
||||||
|
log.debug('Parsing stylesheet...')
|
||||||
|
stylesheet = parser.parseString(text)
|
||||||
|
stylesheet.namespaces['h'] = XHTML_NS
|
||||||
|
css = stylesheet.cssText
|
||||||
|
if isinstance(css, bytes):
|
||||||
|
css = css.decode('utf-8', 'replace')
|
||||||
|
css = css.replace('h|style', 'h|span')
|
||||||
|
css = re.sub(r'name\s*=\s*', 'class=', css)
|
||||||
|
self.extract_embedded_content(doc)
|
||||||
|
log.debug('Converting XML to HTML...')
|
||||||
|
with open(P('templates/fb2.xsl'), 'rb') as f:
|
||||||
|
ss = f.read().decode('utf-8')
|
||||||
|
ss = ss.replace("__FB_NS__", fb_ns)
|
||||||
|
if options.no_inline_fb2_toc:
|
||||||
|
log('Disabling generation of inline FB2 TOC')
|
||||||
|
ss = re.compile(r'<!-- BUILD TOC -->.*<!-- END BUILD TOC -->',
|
||||||
|
re.DOTALL).sub('', ss)
|
||||||
|
|
||||||
|
styledoc = safe_xml_fromstring(ss)
|
||||||
|
|
||||||
|
transform = etree.XSLT(styledoc)
|
||||||
|
result = transform(doc)
|
||||||
|
|
||||||
|
# Handle links of type note and cite
|
||||||
|
notes = {a.get('href')[1:]: a for a in result.xpath('//a[@link_note and @href]') if a.get('href').startswith('#')}
|
||||||
|
cites = {a.get('link_cite'): a for a in result.xpath('//a[@link_cite]') if not a.get('href', '')}
|
||||||
|
all_ids = {x for x in result.xpath('//*/@id')}
|
||||||
|
for cite, a in iteritems(cites):
|
||||||
|
note = notes.get(cite, None)
|
||||||
|
if note:
|
||||||
|
c = 1
|
||||||
|
while 'cite%d' % c in all_ids:
|
||||||
|
c += 1
|
||||||
|
if not note.get('id', None):
|
||||||
|
note.set('id', 'cite%d' % c)
|
||||||
|
all_ids.add(note.get('id'))
|
||||||
|
a.set('href', '#%s' % note.get('id'))
|
||||||
|
for x in result.xpath('//*[@link_note or @link_cite]'):
|
||||||
|
x.attrib.pop('link_note', None)
|
||||||
|
x.attrib.pop('link_cite', None)
|
||||||
|
|
||||||
|
for img in result.xpath('//img[@src]'):
|
||||||
|
src = img.get('src')
|
||||||
|
img.set('src', self.binary_map.get(src, src))
|
||||||
|
index = transform.tostring(result)
|
||||||
|
with open('index.xhtml', 'wb') as f:
|
||||||
|
f.write(index.encode('utf-8'))
|
||||||
|
with open('inline-styles.css', 'wb') as f:
|
||||||
|
f.write(css.encode('utf-8'))
|
||||||
|
stream.seek(0)
|
||||||
|
mi = get_metadata(stream, 'fb2')
|
||||||
|
if not mi.title:
|
||||||
|
mi.title = _('Unknown')
|
||||||
|
if not mi.authors:
|
||||||
|
mi.authors = [_('Unknown')]
|
||||||
|
cpath = None
|
||||||
|
if mi.cover_data and mi.cover_data[1]:
|
||||||
|
with open('fb2_cover_calibre_mi.jpg', 'wb') as f:
|
||||||
|
f.write(mi.cover_data[1])
|
||||||
|
cpath = os.path.abspath('fb2_cover_calibre_mi.jpg')
|
||||||
|
else:
|
||||||
|
for img in doc.xpath('//f:coverpage/f:image', namespaces=NAMESPACES):
|
||||||
|
href = img.get('{%s}href'%XLINK_NS, img.get('href', None))
|
||||||
|
if href is not None:
|
||||||
|
if href.startswith('#'):
|
||||||
|
href = href[1:]
|
||||||
|
cpath = os.path.abspath(href)
|
||||||
|
break
|
||||||
|
|
||||||
|
opf = OPFCreator(getcwd(), mi)
|
||||||
|
entries = [(f2, guess_type(f2)[0]) for f2 in os.listdir(u'.')]
|
||||||
|
opf.create_manifest(entries)
|
||||||
|
opf.create_spine(['index.xhtml'])
|
||||||
|
if cpath:
|
||||||
|
opf.guide.set_cover(cpath)
|
||||||
|
with open('metadata.opf', 'wb') as f:
|
||||||
|
opf.render(f)
|
||||||
|
return os.path.join(getcwd(), 'metadata.opf')
|
||||||
|
|
||||||
|
def extract_embedded_content(self, doc):
|
||||||
|
from calibre.ebooks.fb2 import base64_decode
|
||||||
|
self.binary_map = {}
|
||||||
|
for elem in doc.xpath('./*'):
|
||||||
|
if elem.text and 'binary' in elem.tag and 'id' in elem.attrib:
|
||||||
|
ct = elem.get('content-type', '')
|
||||||
|
fname = elem.attrib['id']
|
||||||
|
ext = ct.rpartition('/')[-1].lower()
|
||||||
|
if ext in ('png', 'jpeg', 'jpg'):
|
||||||
|
if fname.lower().rpartition('.')[-1] not in {'jpg', 'jpeg',
|
||||||
|
'png'}:
|
||||||
|
fname += '.' + ext
|
||||||
|
self.binary_map[elem.get('id')] = fname
|
||||||
|
raw = elem.text.strip()
|
||||||
|
try:
|
||||||
|
data = base64_decode(raw)
|
||||||
|
except TypeError:
|
||||||
|
self.log.exception('Binary data with id=%s is corrupted, ignoring'%(
|
||||||
|
elem.get('id')))
|
||||||
|
else:
|
||||||
|
with open(fname, 'wb') as f:
|
||||||
|
f.write(data)
|
||||||
203
ebook_converter/ebooks/conversion/plugins/fb2_output.py
Normal file
203
ebook_converter/ebooks/conversion/plugins/fb2_output.py
Normal file
@@ -0,0 +1,203 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, OptionRecommendation
|
||||||
|
|
||||||
|
|
||||||
|
class FB2Output(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'FB2 Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'fb2'
|
||||||
|
commit_name = 'fb2_output'
|
||||||
|
|
||||||
|
FB2_GENRES = [
|
||||||
|
# Science Fiction & Fantasy
|
||||||
|
'sf_history', # Alternative history
|
||||||
|
'sf_action', # Action
|
||||||
|
'sf_epic', # Epic
|
||||||
|
'sf_heroic', # Heroic
|
||||||
|
'sf_detective', # Detective
|
||||||
|
'sf_cyberpunk', # Cyberpunk
|
||||||
|
'sf_space', # Space
|
||||||
|
'sf_social', # Social#philosophical
|
||||||
|
'sf_horror', # Horror & mystic
|
||||||
|
'sf_humor', # Humor
|
||||||
|
'sf_fantasy', # Fantasy
|
||||||
|
'sf', # Science Fiction
|
||||||
|
# Detectives & Thrillers
|
||||||
|
'det_classic', # Classical detectives
|
||||||
|
'det_police', # Police Stories
|
||||||
|
'det_action', # Action
|
||||||
|
'det_irony', # Ironical detectives
|
||||||
|
'det_history', # Historical detectives
|
||||||
|
'det_espionage', # Espionage detectives
|
||||||
|
'det_crime', # Crime detectives
|
||||||
|
'det_political', # Political detectives
|
||||||
|
'det_maniac', # Maniacs
|
||||||
|
'det_hard', # Hard#boiled
|
||||||
|
'thriller', # Thrillers
|
||||||
|
'detective', # Detectives
|
||||||
|
# Prose
|
||||||
|
'prose_classic', # Classics prose
|
||||||
|
'prose_history', # Historical prose
|
||||||
|
'prose_contemporary', # Contemporary prose
|
||||||
|
'prose_counter', # Counterculture
|
||||||
|
'prose_rus_classic', # Russial classics prose
|
||||||
|
'prose_su_classics', # Soviet classics prose
|
||||||
|
# Romance
|
||||||
|
'love_contemporary', # Contemporary Romance
|
||||||
|
'love_history', # Historical Romance
|
||||||
|
'love_detective', # Detective Romance
|
||||||
|
'love_short', # Short Romance
|
||||||
|
'love_erotica', # Erotica
|
||||||
|
# Adventure
|
||||||
|
'adv_western', # Western
|
||||||
|
'adv_history', # History
|
||||||
|
'adv_indian', # Indians
|
||||||
|
'adv_maritime', # Maritime Fiction
|
||||||
|
'adv_geo', # Travel & geography
|
||||||
|
'adv_animal', # Nature & animals
|
||||||
|
'adventure', # Other
|
||||||
|
# Children's
|
||||||
|
'child_tale', # Fairy Tales
|
||||||
|
'child_verse', # Verses
|
||||||
|
'child_prose', # Prose
|
||||||
|
'child_sf', # Science Fiction
|
||||||
|
'child_det', # Detectives & Thrillers
|
||||||
|
'child_adv', # Adventures
|
||||||
|
'child_education', # Educational
|
||||||
|
'children', # Other
|
||||||
|
# Poetry & Dramaturgy
|
||||||
|
'poetry', # Poetry
|
||||||
|
'dramaturgy', # Dramaturgy
|
||||||
|
# Antique literature
|
||||||
|
'antique_ant', # Antique
|
||||||
|
'antique_european', # European
|
||||||
|
'antique_russian', # Old russian
|
||||||
|
'antique_east', # Old east
|
||||||
|
'antique_myths', # Myths. Legends. Epos
|
||||||
|
'antique', # Other
|
||||||
|
# Scientific#educational
|
||||||
|
'sci_history', # History
|
||||||
|
'sci_psychology', # Psychology
|
||||||
|
'sci_culture', # Cultural science
|
||||||
|
'sci_religion', # Religious studies
|
||||||
|
'sci_philosophy', # Philosophy
|
||||||
|
'sci_politics', # Politics
|
||||||
|
'sci_business', # Business literature
|
||||||
|
'sci_juris', # Jurisprudence
|
||||||
|
'sci_linguistic', # Linguistics
|
||||||
|
'sci_medicine', # Medicine
|
||||||
|
'sci_phys', # Physics
|
||||||
|
'sci_math', # Mathematics
|
||||||
|
'sci_chem', # Chemistry
|
||||||
|
'sci_biology', # Biology
|
||||||
|
'sci_tech', # Technical
|
||||||
|
'science', # Other
|
||||||
|
# Computers & Internet
|
||||||
|
'comp_www', # Internet
|
||||||
|
'comp_programming', # Programming
|
||||||
|
'comp_hard', # Hardware
|
||||||
|
'comp_soft', # Software
|
||||||
|
'comp_db', # Databases
|
||||||
|
'comp_osnet', # OS & Networking
|
||||||
|
'computers', # Other
|
||||||
|
# Reference
|
||||||
|
'ref_encyc', # Encyclopedias
|
||||||
|
'ref_dict', # Dictionaries
|
||||||
|
'ref_ref', # Reference
|
||||||
|
'ref_guide', # Guidebooks
|
||||||
|
'reference', # Other
|
||||||
|
# Nonfiction
|
||||||
|
'nonf_biography', # Biography & Memoirs
|
||||||
|
'nonf_publicism', # Publicism
|
||||||
|
'nonf_criticism', # Criticism
|
||||||
|
'design', # Art & design
|
||||||
|
'nonfiction', # Other
|
||||||
|
# Religion & Inspiration
|
||||||
|
'religion_rel', # Religion
|
||||||
|
'religion_esoterics', # Esoterics
|
||||||
|
'religion_self', # Self#improvement
|
||||||
|
'religion', # Other
|
||||||
|
# Humor
|
||||||
|
'humor_anecdote', # Anecdote (funny stories)
|
||||||
|
'humor_prose', # Prose
|
||||||
|
'humor_verse', # Verses
|
||||||
|
'humor', # Other
|
||||||
|
# Home & Family
|
||||||
|
'home_cooking', # Cooking
|
||||||
|
'home_pets', # Pets
|
||||||
|
'home_crafts', # Hobbies & Crafts
|
||||||
|
'home_entertain', # Entertaining
|
||||||
|
'home_health', # Health
|
||||||
|
'home_garden', # Garden
|
||||||
|
'home_diy', # Do it yourself
|
||||||
|
'home_sport', # Sports
|
||||||
|
'home_sex', # Erotica & sex
|
||||||
|
'home', # Other
|
||||||
|
]
|
||||||
|
ui_data = {
|
||||||
|
'sectionize': {
|
||||||
|
'toc': _('Section per entry in the ToC'),
|
||||||
|
'files': _('Section per file'),
|
||||||
|
'nothing': _('A single section')
|
||||||
|
},
|
||||||
|
'genres': FB2_GENRES,
|
||||||
|
}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='sectionize',
|
||||||
|
recommended_value='files', level=OptionRecommendation.LOW,
|
||||||
|
choices=list(ui_data['sectionize']),
|
||||||
|
help=_('Specify how sections are created:\n'
|
||||||
|
' * nothing: {nothing}\n'
|
||||||
|
' * files: {files}\n'
|
||||||
|
' * toc: {toc}\n'
|
||||||
|
'If ToC based generation fails, adjust the "Structure detection" and/or "Table of Contents" settings '
|
||||||
|
'(turn on "Force use of auto-generated Table of Contents").').format(**ui_data['sectionize'])
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='fb2_genre',
|
||||||
|
recommended_value='antique', level=OptionRecommendation.LOW,
|
||||||
|
choices=FB2_GENRES,
|
||||||
|
help=(_('Genre for the book. Choices: %s\n\n See: ') % ', '.join(FB2_GENRES)
|
||||||
|
) + 'http://www.fictionbook.org/index.php/Eng:FictionBook_2.1_genres ' + _('for a complete list with descriptions.')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.oeb.transforms.jacket import linearize_jacket
|
||||||
|
from calibre.ebooks.oeb.transforms.rasterize import SVGRasterizer, Unavailable
|
||||||
|
from calibre.ebooks.fb2.fb2ml import FB2MLizer
|
||||||
|
|
||||||
|
try:
|
||||||
|
rasterizer = SVGRasterizer()
|
||||||
|
rasterizer(oeb_book, opts)
|
||||||
|
except Unavailable:
|
||||||
|
log.warn('SVG rasterizer unavailable, SVG will not be converted')
|
||||||
|
|
||||||
|
linearize_jacket(oeb_book)
|
||||||
|
|
||||||
|
fb2mlizer = FB2MLizer(log)
|
||||||
|
fb2_content = fb2mlizer.extract_content(oeb_book, opts)
|
||||||
|
|
||||||
|
close = False
|
||||||
|
if not hasattr(output_path, 'write'):
|
||||||
|
close = True
|
||||||
|
if not os.path.exists(os.path.dirname(output_path)) and os.path.dirname(output_path) != '':
|
||||||
|
os.makedirs(os.path.dirname(output_path))
|
||||||
|
out_stream = lopen(output_path, 'wb')
|
||||||
|
else:
|
||||||
|
out_stream = output_path
|
||||||
|
|
||||||
|
out_stream.seek(0)
|
||||||
|
out_stream.truncate()
|
||||||
|
out_stream.write(fb2_content.encode('utf-8', 'replace'))
|
||||||
|
|
||||||
|
if close:
|
||||||
|
out_stream.close()
|
||||||
316
ebook_converter/ebooks/conversion/plugins/html_input.py
Normal file
316
ebook_converter/ebooks/conversion/plugins/html_input.py
Normal file
@@ -0,0 +1,316 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2012, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import re, tempfile, os
|
||||||
|
from functools import partial
|
||||||
|
|
||||||
|
from calibre.constants import islinux, isbsd
|
||||||
|
from calibre.customize.conversion import (InputFormatPlugin,
|
||||||
|
OptionRecommendation)
|
||||||
|
from calibre.utils.localization import get_lang
|
||||||
|
from calibre.utils.filenames import ascii_filename
|
||||||
|
from calibre.utils.imghdr import what
|
||||||
|
from polyglot.builtins import unicode_type, zip, getcwd, as_unicode
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_file_name(x):
|
||||||
|
ans = re.sub(r'\s+', ' ', re.sub(r'[?&=;#]', '_', ascii_filename(x))).strip().rstrip('.')
|
||||||
|
ans, ext = ans.rpartition('.')[::2]
|
||||||
|
return (ans.strip() + '.' + ext.strip()).rstrip('.')
|
||||||
|
|
||||||
|
|
||||||
|
class HTMLInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'HTML Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Convert HTML and OPF files to an OEB'
|
||||||
|
file_types = {'opf', 'html', 'htm', 'xhtml', 'xhtm', 'shtm', 'shtml'}
|
||||||
|
commit_name = 'html_input'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='breadth_first',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Traverse links in HTML files breadth first. Normally, '
|
||||||
|
'they are traversed depth first.'
|
||||||
|
)
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='max_levels',
|
||||||
|
recommended_value=5, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Maximum levels of recursion when following links in '
|
||||||
|
'HTML files. Must be non-negative. 0 implies that no '
|
||||||
|
'links in the root HTML file are followed. Default is '
|
||||||
|
'%default.'
|
||||||
|
)
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='dont_package',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Normally this input plugin re-arranges all the input '
|
||||||
|
'files into a standard folder hierarchy. Only use this option '
|
||||||
|
'if you know what you are doing as it can result in various '
|
||||||
|
'nasty side effects in the rest of the conversion pipeline.'
|
||||||
|
)
|
||||||
|
),
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, stream, opts, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
self._is_case_sensitive = None
|
||||||
|
basedir = getcwd()
|
||||||
|
self.opts = opts
|
||||||
|
|
||||||
|
fname = None
|
||||||
|
if hasattr(stream, 'name'):
|
||||||
|
basedir = os.path.dirname(stream.name)
|
||||||
|
fname = os.path.basename(stream.name)
|
||||||
|
|
||||||
|
if file_ext != 'opf':
|
||||||
|
if opts.dont_package:
|
||||||
|
raise ValueError('The --dont-package option is not supported for an HTML input file')
|
||||||
|
from calibre.ebooks.metadata.html import get_metadata
|
||||||
|
mi = get_metadata(stream)
|
||||||
|
if fname:
|
||||||
|
from calibre.ebooks.metadata.meta import metadata_from_filename
|
||||||
|
fmi = metadata_from_filename(fname)
|
||||||
|
fmi.smart_update(mi)
|
||||||
|
mi = fmi
|
||||||
|
oeb = self.create_oebbook(stream.name, basedir, opts, log, mi)
|
||||||
|
return oeb
|
||||||
|
|
||||||
|
from calibre.ebooks.conversion.plumber import create_oebbook
|
||||||
|
return create_oebbook(log, stream.name, opts,
|
||||||
|
encoding=opts.input_encoding)
|
||||||
|
|
||||||
|
def is_case_sensitive(self, path):
|
||||||
|
if getattr(self, '_is_case_sensitive', None) is not None:
|
||||||
|
return self._is_case_sensitive
|
||||||
|
if not path or not os.path.exists(path):
|
||||||
|
return islinux or isbsd
|
||||||
|
self._is_case_sensitive = not (os.path.exists(path.lower()) and os.path.exists(path.upper()))
|
||||||
|
return self._is_case_sensitive
|
||||||
|
|
||||||
|
def create_oebbook(self, htmlpath, basedir, opts, log, mi):
|
||||||
|
import uuid
|
||||||
|
from calibre.ebooks.conversion.plumber import create_oebbook
|
||||||
|
from calibre.ebooks.oeb.base import (DirContainer,
|
||||||
|
rewrite_links, urlnormalize, urldefrag, BINARY_MIME, OEB_STYLES,
|
||||||
|
xpath, urlquote)
|
||||||
|
from calibre import guess_type
|
||||||
|
from calibre.ebooks.oeb.transforms.metadata import \
|
||||||
|
meta_info_to_oeb_metadata
|
||||||
|
from calibre.ebooks.html.input import get_filelist
|
||||||
|
from calibre.ebooks.metadata import string_to_authors
|
||||||
|
from calibre.utils.localization import canonicalize_lang
|
||||||
|
import css_parser, logging
|
||||||
|
css_parser.log.setLevel(logging.WARN)
|
||||||
|
self.OEB_STYLES = OEB_STYLES
|
||||||
|
oeb = create_oebbook(log, None, opts, self,
|
||||||
|
encoding=opts.input_encoding, populate=False)
|
||||||
|
self.oeb = oeb
|
||||||
|
|
||||||
|
metadata = oeb.metadata
|
||||||
|
meta_info_to_oeb_metadata(mi, metadata, log)
|
||||||
|
if not metadata.language:
|
||||||
|
l = canonicalize_lang(getattr(opts, 'language', None))
|
||||||
|
if not l:
|
||||||
|
oeb.logger.warn('Language not specified')
|
||||||
|
l = get_lang().replace('_', '-')
|
||||||
|
metadata.add('language', l)
|
||||||
|
if not metadata.creator:
|
||||||
|
a = getattr(opts, 'authors', None)
|
||||||
|
if a:
|
||||||
|
a = string_to_authors(a)
|
||||||
|
if not a:
|
||||||
|
oeb.logger.warn('Creator not specified')
|
||||||
|
a = [self.oeb.translate(__('Unknown'))]
|
||||||
|
for aut in a:
|
||||||
|
metadata.add('creator', aut)
|
||||||
|
if not metadata.title:
|
||||||
|
oeb.logger.warn('Title not specified')
|
||||||
|
metadata.add('title', self.oeb.translate(__('Unknown')))
|
||||||
|
bookid = unicode_type(uuid.uuid4())
|
||||||
|
metadata.add('identifier', bookid, id='uuid_id', scheme='uuid')
|
||||||
|
for ident in metadata.identifier:
|
||||||
|
if 'id' in ident.attrib:
|
||||||
|
self.oeb.uid = metadata.identifier[0]
|
||||||
|
break
|
||||||
|
|
||||||
|
filelist = get_filelist(htmlpath, basedir, opts, log)
|
||||||
|
filelist = [f for f in filelist if not f.is_binary]
|
||||||
|
htmlfile_map = {}
|
||||||
|
for f in filelist:
|
||||||
|
path = f.path
|
||||||
|
oeb.container = DirContainer(os.path.dirname(path), log,
|
||||||
|
ignore_opf=True)
|
||||||
|
bname = os.path.basename(path)
|
||||||
|
id, href = oeb.manifest.generate(id='html', href=sanitize_file_name(bname))
|
||||||
|
htmlfile_map[path] = href
|
||||||
|
item = oeb.manifest.add(id, href, 'text/html')
|
||||||
|
if path == htmlpath and '%' in path:
|
||||||
|
bname = urlquote(bname)
|
||||||
|
item.html_input_href = bname
|
||||||
|
oeb.spine.add(item, True)
|
||||||
|
|
||||||
|
self.added_resources = {}
|
||||||
|
self.log = log
|
||||||
|
self.log('Normalizing filename cases')
|
||||||
|
for path, href in htmlfile_map.items():
|
||||||
|
if not self.is_case_sensitive(path):
|
||||||
|
path = path.lower()
|
||||||
|
self.added_resources[path] = href
|
||||||
|
self.urlnormalize, self.DirContainer = urlnormalize, DirContainer
|
||||||
|
self.urldefrag = urldefrag
|
||||||
|
self.guess_type, self.BINARY_MIME = guess_type, BINARY_MIME
|
||||||
|
|
||||||
|
self.log('Rewriting HTML links')
|
||||||
|
for f in filelist:
|
||||||
|
path = f.path
|
||||||
|
dpath = os.path.dirname(path)
|
||||||
|
oeb.container = DirContainer(dpath, log, ignore_opf=True)
|
||||||
|
href = htmlfile_map[path]
|
||||||
|
try:
|
||||||
|
item = oeb.manifest.hrefs[href]
|
||||||
|
except KeyError:
|
||||||
|
item = oeb.manifest.hrefs[urlnormalize(href)]
|
||||||
|
rewrite_links(item.data, partial(self.resource_adder, base=dpath))
|
||||||
|
|
||||||
|
for item in oeb.manifest.values():
|
||||||
|
if item.media_type in self.OEB_STYLES:
|
||||||
|
dpath = None
|
||||||
|
for path, href in self.added_resources.items():
|
||||||
|
if href == item.href:
|
||||||
|
dpath = os.path.dirname(path)
|
||||||
|
break
|
||||||
|
css_parser.replaceUrls(item.data,
|
||||||
|
partial(self.resource_adder, base=dpath))
|
||||||
|
|
||||||
|
toc = self.oeb.toc
|
||||||
|
self.oeb.auto_generated_toc = True
|
||||||
|
titles = []
|
||||||
|
headers = []
|
||||||
|
for item in self.oeb.spine:
|
||||||
|
if not item.linear:
|
||||||
|
continue
|
||||||
|
html = item.data
|
||||||
|
title = ''.join(xpath(html, '/h:html/h:head/h:title/text()'))
|
||||||
|
title = re.sub(r'\s+', ' ', title.strip())
|
||||||
|
if title:
|
||||||
|
titles.append(title)
|
||||||
|
headers.append('(unlabled)')
|
||||||
|
for tag in ('h1', 'h2', 'h3', 'h4', 'h5', 'strong'):
|
||||||
|
expr = '/h:html/h:body//h:%s[position()=1]/text()'
|
||||||
|
header = ''.join(xpath(html, expr % tag))
|
||||||
|
header = re.sub(r'\s+', ' ', header.strip())
|
||||||
|
if header:
|
||||||
|
headers[-1] = header
|
||||||
|
break
|
||||||
|
use = titles
|
||||||
|
if len(titles) > len(set(titles)):
|
||||||
|
use = headers
|
||||||
|
for title, item in zip(use, self.oeb.spine):
|
||||||
|
if not item.linear:
|
||||||
|
continue
|
||||||
|
toc.add(title, item.href)
|
||||||
|
|
||||||
|
oeb.container = DirContainer(getcwd(), oeb.log, ignore_opf=True)
|
||||||
|
return oeb
|
||||||
|
|
||||||
|
def link_to_local_path(self, link_, base=None):
|
||||||
|
from calibre.ebooks.html.input import Link
|
||||||
|
if not isinstance(link_, unicode_type):
|
||||||
|
try:
|
||||||
|
link_ = link_.decode('utf-8', 'error')
|
||||||
|
except:
|
||||||
|
self.log.warn('Failed to decode link %r. Ignoring'%link_)
|
||||||
|
return None, None
|
||||||
|
try:
|
||||||
|
l = Link(link_, base if base else getcwd())
|
||||||
|
except:
|
||||||
|
self.log.exception('Failed to process link: %r'%link_)
|
||||||
|
return None, None
|
||||||
|
if l.path is None:
|
||||||
|
# Not a local resource
|
||||||
|
return None, None
|
||||||
|
link = l.path.replace('/', os.sep).strip()
|
||||||
|
frag = l.fragment
|
||||||
|
if not link:
|
||||||
|
return None, None
|
||||||
|
return link, frag
|
||||||
|
|
||||||
|
def resource_adder(self, link_, base=None):
|
||||||
|
from polyglot.urllib import quote
|
||||||
|
link, frag = self.link_to_local_path(link_, base=base)
|
||||||
|
if link is None:
|
||||||
|
return link_
|
||||||
|
try:
|
||||||
|
if base and not os.path.isabs(link):
|
||||||
|
link = os.path.join(base, link)
|
||||||
|
link = os.path.abspath(link)
|
||||||
|
except:
|
||||||
|
return link_
|
||||||
|
if not os.access(link, os.R_OK):
|
||||||
|
return link_
|
||||||
|
if os.path.isdir(link):
|
||||||
|
self.log.warn(link_, 'is a link to a directory. Ignoring.')
|
||||||
|
return link_
|
||||||
|
if not self.is_case_sensitive(tempfile.gettempdir()):
|
||||||
|
link = link.lower()
|
||||||
|
if link not in self.added_resources:
|
||||||
|
bhref = os.path.basename(link)
|
||||||
|
id, href = self.oeb.manifest.generate(id='added', href=sanitize_file_name(bhref))
|
||||||
|
guessed = self.guess_type(href)[0]
|
||||||
|
media_type = guessed or self.BINARY_MIME
|
||||||
|
if media_type == 'text/plain':
|
||||||
|
self.log.warn('Ignoring link to text file %r'%link_)
|
||||||
|
return None
|
||||||
|
if media_type == self.BINARY_MIME:
|
||||||
|
# Check for the common case, images
|
||||||
|
try:
|
||||||
|
img = what(link)
|
||||||
|
except EnvironmentError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
if img:
|
||||||
|
media_type = self.guess_type('dummy.'+img)[0] or self.BINARY_MIME
|
||||||
|
|
||||||
|
self.oeb.log.debug('Added', link)
|
||||||
|
self.oeb.container = self.DirContainer(os.path.dirname(link),
|
||||||
|
self.oeb.log, ignore_opf=True)
|
||||||
|
# Load into memory
|
||||||
|
item = self.oeb.manifest.add(id, href, media_type)
|
||||||
|
# bhref refers to an already existing file. The read() method of
|
||||||
|
# DirContainer will call unquote on it before trying to read the
|
||||||
|
# file, therefore we quote it here.
|
||||||
|
if isinstance(bhref, unicode_type):
|
||||||
|
bhref = bhref.encode('utf-8')
|
||||||
|
item.html_input_href = as_unicode(quote(bhref))
|
||||||
|
if guessed in self.OEB_STYLES:
|
||||||
|
item.override_css_fetch = partial(
|
||||||
|
self.css_import_handler, os.path.dirname(link))
|
||||||
|
item.data
|
||||||
|
self.added_resources[link] = href
|
||||||
|
|
||||||
|
nlink = self.added_resources[link]
|
||||||
|
if frag:
|
||||||
|
nlink = '#'.join((nlink, frag))
|
||||||
|
return nlink
|
||||||
|
|
||||||
|
def css_import_handler(self, base, href):
|
||||||
|
link, frag = self.link_to_local_path(href, base=base)
|
||||||
|
if link is None or not os.access(link, os.R_OK) or os.path.isdir(link):
|
||||||
|
return (None, None)
|
||||||
|
try:
|
||||||
|
with open(link, 'rb') as f:
|
||||||
|
raw = f.read().decode('utf-8', 'replace')
|
||||||
|
raw = self.oeb.css_preprocessor(raw, add_namespace=False)
|
||||||
|
except:
|
||||||
|
self.log.exception('Failed to read CSS file: %r'%link)
|
||||||
|
return (None, None)
|
||||||
|
return (None, raw)
|
||||||
226
ebook_converter/ebooks/conversion/plugins/html_output.py
Normal file
226
ebook_converter/ebooks/conversion/plugins/html_output.py
Normal file
@@ -0,0 +1,226 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2010, Fabian Grassl <fg@jusmeum.de>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, re, shutil
|
||||||
|
from os.path import dirname, abspath, relpath as _relpath, exists, basename
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, OptionRecommendation
|
||||||
|
from calibre import CurrentDir
|
||||||
|
from calibre.ptempfile import PersistentTemporaryDirectory
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
def relpath(*args):
|
||||||
|
return _relpath(*args).replace(os.sep, '/')
|
||||||
|
|
||||||
|
|
||||||
|
class HTMLOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'HTML Output'
|
||||||
|
author = 'Fabian Grassl'
|
||||||
|
file_type = 'zip'
|
||||||
|
commit_name = 'html_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='template_css',
|
||||||
|
help=_('CSS file used for the output instead of the default file')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='template_html_index',
|
||||||
|
help=_('Template used for generation of the HTML index file instead of the default file')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='template_html',
|
||||||
|
help=_('Template used for the generation of the HTML contents of the book instead of the default file')),
|
||||||
|
|
||||||
|
OptionRecommendation(name='extract_to',
|
||||||
|
help=_('Extract the contents of the generated ZIP file to the '
|
||||||
|
'specified directory. WARNING: The contents of the directory '
|
||||||
|
'will be deleted.')
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
recommendations = {('pretty_print', True, OptionRecommendation.HIGH)}
|
||||||
|
|
||||||
|
def generate_toc(self, oeb_book, ref_url, output_dir):
|
||||||
|
'''
|
||||||
|
Generate table of contents
|
||||||
|
'''
|
||||||
|
from lxml import etree
|
||||||
|
from polyglot.urllib import unquote
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.base import element
|
||||||
|
from calibre.utils.cleantext import clean_xml_chars
|
||||||
|
with CurrentDir(output_dir):
|
||||||
|
def build_node(current_node, parent=None):
|
||||||
|
if parent is None:
|
||||||
|
parent = etree.Element('ul')
|
||||||
|
elif len(current_node.nodes):
|
||||||
|
parent = element(parent, ('ul'))
|
||||||
|
for node in current_node.nodes:
|
||||||
|
point = element(parent, 'li')
|
||||||
|
href = relpath(abspath(unquote(node.href)), dirname(ref_url))
|
||||||
|
if isinstance(href, bytes):
|
||||||
|
href = href.decode('utf-8')
|
||||||
|
link = element(point, 'a', href=clean_xml_chars(href))
|
||||||
|
title = node.title
|
||||||
|
if isinstance(title, bytes):
|
||||||
|
title = title.decode('utf-8')
|
||||||
|
if title:
|
||||||
|
title = re.sub(r'\s+', ' ', title)
|
||||||
|
link.text = clean_xml_chars(title)
|
||||||
|
build_node(node, point)
|
||||||
|
return parent
|
||||||
|
wrap = etree.Element('div')
|
||||||
|
wrap.append(build_node(oeb_book.toc))
|
||||||
|
return wrap
|
||||||
|
|
||||||
|
def generate_html_toc(self, oeb_book, ref_url, output_dir):
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
root = self.generate_toc(oeb_book, ref_url, output_dir)
|
||||||
|
return etree.tostring(root, pretty_print=True, encoding='unicode',
|
||||||
|
xml_declaration=False)
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.utils import zipfile
|
||||||
|
from templite import Templite
|
||||||
|
from polyglot.urllib import unquote
|
||||||
|
from calibre.ebooks.html.meta import EasyMeta
|
||||||
|
|
||||||
|
# read template files
|
||||||
|
if opts.template_html_index is not None:
|
||||||
|
with open(opts.template_html_index, 'rb') as f:
|
||||||
|
template_html_index_data = f.read()
|
||||||
|
else:
|
||||||
|
template_html_index_data = P('templates/html_export_default_index.tmpl', data=True)
|
||||||
|
|
||||||
|
if opts.template_html is not None:
|
||||||
|
with open(opts.template_html, 'rb') as f:
|
||||||
|
template_html_data = f.read()
|
||||||
|
else:
|
||||||
|
template_html_data = P('templates/html_export_default.tmpl', data=True)
|
||||||
|
|
||||||
|
if opts.template_css is not None:
|
||||||
|
with open(opts.template_css, 'rb') as f:
|
||||||
|
template_css_data = f.read()
|
||||||
|
else:
|
||||||
|
template_css_data = P('templates/html_export_default.css', data=True)
|
||||||
|
|
||||||
|
template_html_index_data = template_html_index_data.decode('utf-8')
|
||||||
|
template_html_data = template_html_data.decode('utf-8')
|
||||||
|
template_css_data = template_css_data.decode('utf-8')
|
||||||
|
|
||||||
|
self.log = log
|
||||||
|
self.opts = opts
|
||||||
|
meta = EasyMeta(oeb_book.metadata)
|
||||||
|
|
||||||
|
tempdir = os.path.realpath(PersistentTemporaryDirectory())
|
||||||
|
output_file = os.path.join(tempdir,
|
||||||
|
basename(re.sub(r'\.zip', '', output_path)+'.html'))
|
||||||
|
output_dir = re.sub(r'\.html', '', output_file)+'_files'
|
||||||
|
|
||||||
|
if not exists(output_dir):
|
||||||
|
os.makedirs(output_dir)
|
||||||
|
|
||||||
|
css_path = output_dir+os.sep+'calibreHtmlOutBasicCss.css'
|
||||||
|
with open(css_path, 'wb') as f:
|
||||||
|
f.write(template_css_data.encode('utf-8'))
|
||||||
|
|
||||||
|
with open(output_file, 'wb') as f:
|
||||||
|
html_toc = self.generate_html_toc(oeb_book, output_file, output_dir)
|
||||||
|
templite = Templite(template_html_index_data)
|
||||||
|
nextLink = oeb_book.spine[0].href
|
||||||
|
nextLink = relpath(output_dir+os.sep+nextLink, dirname(output_file))
|
||||||
|
cssLink = relpath(abspath(css_path), dirname(output_file))
|
||||||
|
tocUrl = relpath(output_file, dirname(output_file))
|
||||||
|
t = templite.render(has_toc=bool(oeb_book.toc.count()),
|
||||||
|
toc=html_toc, meta=meta, nextLink=nextLink,
|
||||||
|
tocUrl=tocUrl, cssLink=cssLink,
|
||||||
|
firstContentPageLink=nextLink)
|
||||||
|
if isinstance(t, unicode_type):
|
||||||
|
t = t.encode('utf-8')
|
||||||
|
f.write(t)
|
||||||
|
|
||||||
|
with CurrentDir(output_dir):
|
||||||
|
for item in oeb_book.manifest:
|
||||||
|
path = abspath(unquote(item.href))
|
||||||
|
dir = dirname(path)
|
||||||
|
if not exists(dir):
|
||||||
|
os.makedirs(dir)
|
||||||
|
if item.spine_position is not None:
|
||||||
|
with open(path, 'wb') as f:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
with open(path, 'wb') as f:
|
||||||
|
f.write(item.bytes_representation)
|
||||||
|
item.unload_data_from_memory(memory=path)
|
||||||
|
|
||||||
|
for item in oeb_book.spine:
|
||||||
|
path = abspath(unquote(item.href))
|
||||||
|
dir = dirname(path)
|
||||||
|
root = item.data.getroottree()
|
||||||
|
|
||||||
|
# get & clean HTML <HEAD>-data
|
||||||
|
head = root.xpath('//h:head', namespaces={'h': 'http://www.w3.org/1999/xhtml'})[0]
|
||||||
|
head_content = etree.tostring(head, pretty_print=True, encoding='unicode')
|
||||||
|
head_content = re.sub(r'\<\/?head.*\>', '', head_content)
|
||||||
|
head_content = re.sub(re.compile(r'\<style.*\/style\>', re.M|re.S), '', head_content)
|
||||||
|
head_content = re.sub(r'<(title)([^>]*)/>', r'<\1\2></\1>', head_content)
|
||||||
|
|
||||||
|
# get & clean HTML <BODY>-data
|
||||||
|
body = root.xpath('//h:body', namespaces={'h': 'http://www.w3.org/1999/xhtml'})[0]
|
||||||
|
ebook_content = etree.tostring(body, pretty_print=True, encoding='unicode')
|
||||||
|
ebook_content = re.sub(r'\<\/?body.*\>', '', ebook_content)
|
||||||
|
ebook_content = re.sub(r'<(div|a|span)([^>]*)/>', r'<\1\2></\1>', ebook_content)
|
||||||
|
|
||||||
|
# generate link to next page
|
||||||
|
if item.spine_position+1 < len(oeb_book.spine):
|
||||||
|
nextLink = oeb_book.spine[item.spine_position+1].href
|
||||||
|
nextLink = relpath(abspath(nextLink), dir)
|
||||||
|
else:
|
||||||
|
nextLink = None
|
||||||
|
|
||||||
|
# generate link to previous page
|
||||||
|
if item.spine_position > 0:
|
||||||
|
prevLink = oeb_book.spine[item.spine_position-1].href
|
||||||
|
prevLink = relpath(abspath(prevLink), dir)
|
||||||
|
else:
|
||||||
|
prevLink = None
|
||||||
|
|
||||||
|
cssLink = relpath(abspath(css_path), dir)
|
||||||
|
tocUrl = relpath(output_file, dir)
|
||||||
|
firstContentPageLink = oeb_book.spine[0].href
|
||||||
|
|
||||||
|
# render template
|
||||||
|
templite = Templite(template_html_data)
|
||||||
|
toc = lambda: self.generate_html_toc(oeb_book, path, output_dir)
|
||||||
|
t = templite.render(ebookContent=ebook_content,
|
||||||
|
prevLink=prevLink, nextLink=nextLink,
|
||||||
|
has_toc=bool(oeb_book.toc.count()), toc=toc,
|
||||||
|
tocUrl=tocUrl, head_content=head_content,
|
||||||
|
meta=meta, cssLink=cssLink,
|
||||||
|
firstContentPageLink=firstContentPageLink)
|
||||||
|
|
||||||
|
# write html to file
|
||||||
|
with open(path, 'wb') as f:
|
||||||
|
f.write(t.encode('utf-8'))
|
||||||
|
item.unload_data_from_memory(memory=path)
|
||||||
|
|
||||||
|
zfile = zipfile.ZipFile(output_path, "w")
|
||||||
|
zfile.add_dir(output_dir, basename(output_dir))
|
||||||
|
zfile.write(output_file, basename(output_file), zipfile.ZIP_DEFLATED)
|
||||||
|
|
||||||
|
if opts.extract_to:
|
||||||
|
if os.path.exists(opts.extract_to):
|
||||||
|
shutil.rmtree(opts.extract_to)
|
||||||
|
os.makedirs(opts.extract_to)
|
||||||
|
zfile.extractall(opts.extract_to)
|
||||||
|
self.log('Zip file extracted to', opts.extract_to)
|
||||||
|
|
||||||
|
zfile.close()
|
||||||
|
|
||||||
|
# cleanup temp dir
|
||||||
|
shutil.rmtree(tempdir)
|
||||||
133
ebook_converter/ebooks/conversion/plugins/htmlz_input.py
Normal file
133
ebook_converter/ebooks/conversion/plugins/htmlz_input.py
Normal file
@@ -0,0 +1,133 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre import guess_type
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class HTMLZInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'HTLZ Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert HTML files to HTML'
|
||||||
|
file_types = {'htmlz'}
|
||||||
|
commit_name = 'htmlz_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPF
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
|
||||||
|
self.log = log
|
||||||
|
html = u''
|
||||||
|
top_levels = []
|
||||||
|
|
||||||
|
# Extract content from zip archive.
|
||||||
|
zf = ZipFile(stream)
|
||||||
|
zf.extractall()
|
||||||
|
|
||||||
|
# Find the HTML file in the archive. It needs to be
|
||||||
|
# top level.
|
||||||
|
index = u''
|
||||||
|
multiple_html = False
|
||||||
|
# Get a list of all top level files in the archive.
|
||||||
|
for x in os.listdir(u'.'):
|
||||||
|
if os.path.isfile(x):
|
||||||
|
top_levels.append(x)
|
||||||
|
# Try to find an index. file.
|
||||||
|
for x in top_levels:
|
||||||
|
if x.lower() in (u'index.html', u'index.xhtml', u'index.htm'):
|
||||||
|
index = x
|
||||||
|
break
|
||||||
|
# Look for multiple HTML files in the archive. We look at the
|
||||||
|
# top level files only as only they matter in HTMLZ.
|
||||||
|
for x in top_levels:
|
||||||
|
if os.path.splitext(x)[1].lower() in (u'.html', u'.xhtml', u'.htm'):
|
||||||
|
# Set index to the first HTML file found if it's not
|
||||||
|
# called index.
|
||||||
|
if not index:
|
||||||
|
index = x
|
||||||
|
else:
|
||||||
|
multiple_html = True
|
||||||
|
# Warn the user if there multiple HTML file in the archive. HTMLZ
|
||||||
|
# supports a single HTML file. A conversion with a multiple HTML file
|
||||||
|
# HTMLZ archive probably won't turn out as the user expects. With
|
||||||
|
# Multiple HTML files ZIP input should be used in place of HTMLZ.
|
||||||
|
if multiple_html:
|
||||||
|
log.warn(_('Multiple HTML files found in the archive. Only %s will be used.') % index)
|
||||||
|
|
||||||
|
if index:
|
||||||
|
with open(index, 'rb') as tf:
|
||||||
|
html = tf.read()
|
||||||
|
else:
|
||||||
|
raise Exception(_('No top level HTML file found.'))
|
||||||
|
|
||||||
|
if not html:
|
||||||
|
raise Exception(_('Top level HTML file %s is empty') % index)
|
||||||
|
|
||||||
|
# Encoding
|
||||||
|
if options.input_encoding:
|
||||||
|
ienc = options.input_encoding
|
||||||
|
else:
|
||||||
|
ienc = xml_to_unicode(html[:4096])[-1]
|
||||||
|
html = html.decode(ienc, 'replace')
|
||||||
|
|
||||||
|
# Run the HTML through the html processing plugin.
|
||||||
|
from calibre.customize.ui import plugin_for_input_format
|
||||||
|
html_input = plugin_for_input_format('html')
|
||||||
|
for opt in html_input.options:
|
||||||
|
setattr(options, opt.option.name, opt.recommended_value)
|
||||||
|
options.input_encoding = 'utf-8'
|
||||||
|
base = getcwd()
|
||||||
|
htmlfile = os.path.join(base, u'index.html')
|
||||||
|
c = 0
|
||||||
|
while os.path.exists(htmlfile):
|
||||||
|
c += 1
|
||||||
|
htmlfile = u'index%d.html'%c
|
||||||
|
with open(htmlfile, 'wb') as f:
|
||||||
|
f.write(html.encode('utf-8'))
|
||||||
|
odi = options.debug_pipeline
|
||||||
|
options.debug_pipeline = None
|
||||||
|
# Generate oeb from html conversion.
|
||||||
|
with open(htmlfile, 'rb') as f:
|
||||||
|
oeb = html_input.convert(f, options, 'html', log,
|
||||||
|
{})
|
||||||
|
options.debug_pipeline = odi
|
||||||
|
os.remove(htmlfile)
|
||||||
|
|
||||||
|
# Set metadata from file.
|
||||||
|
from calibre.customize.ui import get_file_type_metadata
|
||||||
|
from calibre.ebooks.oeb.transforms.metadata import meta_info_to_oeb_metadata
|
||||||
|
mi = get_file_type_metadata(stream, file_ext)
|
||||||
|
meta_info_to_oeb_metadata(mi, oeb.metadata, log)
|
||||||
|
|
||||||
|
# Get the cover path from the OPF.
|
||||||
|
cover_path = None
|
||||||
|
opf = None
|
||||||
|
for x in top_levels:
|
||||||
|
if os.path.splitext(x)[1].lower() == u'.opf':
|
||||||
|
opf = x
|
||||||
|
break
|
||||||
|
if opf:
|
||||||
|
opf = OPF(opf, basedir=getcwd())
|
||||||
|
cover_path = opf.raster_cover or opf.cover
|
||||||
|
# Set the cover.
|
||||||
|
if cover_path:
|
||||||
|
cdata = None
|
||||||
|
with open(os.path.join(getcwd(), cover_path), 'rb') as cf:
|
||||||
|
cdata = cf.read()
|
||||||
|
cover_name = os.path.basename(cover_path)
|
||||||
|
id, href = oeb.manifest.generate('cover', cover_name)
|
||||||
|
oeb.manifest.add(id, href, guess_type(cover_name)[0], data=cdata)
|
||||||
|
oeb.guide.add('cover', 'Cover', href)
|
||||||
|
|
||||||
|
return oeb
|
||||||
136
ebook_converter/ebooks/conversion/plugins/htmlz_output.py
Normal file
136
ebook_converter/ebooks/conversion/plugins/htmlz_output.py
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2011, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import io
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, \
|
||||||
|
OptionRecommendation
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class HTMLZOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'HTMLZ Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'htmlz'
|
||||||
|
commit_name = 'htmlz_output'
|
||||||
|
ui_data = {
|
||||||
|
'css_choices': {
|
||||||
|
'class': _('Use CSS classes'),
|
||||||
|
'inline': _('Use the style attribute'),
|
||||||
|
'tag': _('Use HTML tags wherever possible')
|
||||||
|
},
|
||||||
|
'sheet_choices': {
|
||||||
|
'external': _('Use an external CSS file'),
|
||||||
|
'inline': _('Use a <style> tag in the HTML file')
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='htmlz_css_type', recommended_value='class',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
choices=list(ui_data['css_choices']),
|
||||||
|
help=_('Specify the handling of CSS. Default is class.\n'
|
||||||
|
'class: {class}\n'
|
||||||
|
'inline: {inline}\n'
|
||||||
|
'tag: {tag}'
|
||||||
|
).format(**ui_data['css_choices'])),
|
||||||
|
OptionRecommendation(name='htmlz_class_style', recommended_value='external',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
choices=list(ui_data['sheet_choices']),
|
||||||
|
help=_('How to handle the CSS when using css-type = \'class\'.\n'
|
||||||
|
'Default is external.\n'
|
||||||
|
'external: {external}\n'
|
||||||
|
'inline: {inline}'
|
||||||
|
).format(**ui_data['sheet_choices'])),
|
||||||
|
OptionRecommendation(name='htmlz_title_filename',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('If set this option causes the file name of the HTML file'
|
||||||
|
' inside the HTMLZ archive to be based on the book title.')
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.ebooks.oeb.base import OEB_IMAGES, SVG_MIME
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPF, metadata_to_opf
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
from calibre.utils.filenames import ascii_filename
|
||||||
|
|
||||||
|
# HTML
|
||||||
|
if opts.htmlz_css_type == 'inline':
|
||||||
|
from calibre.ebooks.htmlz.oeb2html import OEB2HTMLInlineCSSizer
|
||||||
|
OEB2HTMLizer = OEB2HTMLInlineCSSizer
|
||||||
|
elif opts.htmlz_css_type == 'tag':
|
||||||
|
from calibre.ebooks.htmlz.oeb2html import OEB2HTMLNoCSSizer
|
||||||
|
OEB2HTMLizer = OEB2HTMLNoCSSizer
|
||||||
|
else:
|
||||||
|
from calibre.ebooks.htmlz.oeb2html import OEB2HTMLClassCSSizer as OEB2HTMLizer
|
||||||
|
|
||||||
|
with TemporaryDirectory(u'_htmlz_output') as tdir:
|
||||||
|
htmlizer = OEB2HTMLizer(log)
|
||||||
|
html = htmlizer.oeb2html(oeb_book, opts)
|
||||||
|
|
||||||
|
fname = u'index'
|
||||||
|
if opts.htmlz_title_filename:
|
||||||
|
from calibre.utils.filenames import shorten_components_to
|
||||||
|
fname = shorten_components_to(100, (ascii_filename(unicode_type(oeb_book.metadata.title[0])),))[0]
|
||||||
|
with open(os.path.join(tdir, fname+u'.html'), 'wb') as tf:
|
||||||
|
if isinstance(html, unicode_type):
|
||||||
|
html = html.encode('utf-8')
|
||||||
|
tf.write(html)
|
||||||
|
|
||||||
|
# CSS
|
||||||
|
if opts.htmlz_css_type == 'class' and opts.htmlz_class_style == 'external':
|
||||||
|
with open(os.path.join(tdir, u'style.css'), 'wb') as tf:
|
||||||
|
tf.write(htmlizer.get_css(oeb_book))
|
||||||
|
|
||||||
|
# Images
|
||||||
|
images = htmlizer.images
|
||||||
|
if images:
|
||||||
|
if not os.path.exists(os.path.join(tdir, u'images')):
|
||||||
|
os.makedirs(os.path.join(tdir, u'images'))
|
||||||
|
for item in oeb_book.manifest:
|
||||||
|
if item.media_type in OEB_IMAGES and item.href in images:
|
||||||
|
if item.media_type == SVG_MIME:
|
||||||
|
data = etree.tostring(item.data, encoding='unicode')
|
||||||
|
else:
|
||||||
|
data = item.data
|
||||||
|
fname = os.path.join(tdir, u'images', images[item.href])
|
||||||
|
with open(fname, 'wb') as img:
|
||||||
|
img.write(data)
|
||||||
|
|
||||||
|
# Cover
|
||||||
|
cover_path = None
|
||||||
|
try:
|
||||||
|
cover_data = None
|
||||||
|
if oeb_book.metadata.cover:
|
||||||
|
term = oeb_book.metadata.cover[0].term
|
||||||
|
cover_data = oeb_book.guide[term].item.data
|
||||||
|
if cover_data:
|
||||||
|
from calibre.utils.img import save_cover_data_to
|
||||||
|
cover_path = os.path.join(tdir, u'cover.jpg')
|
||||||
|
with lopen(cover_path, 'w') as cf:
|
||||||
|
cf.write('')
|
||||||
|
save_cover_data_to(cover_data, cover_path)
|
||||||
|
except:
|
||||||
|
import traceback
|
||||||
|
traceback.print_exc()
|
||||||
|
|
||||||
|
# Metadata
|
||||||
|
with open(os.path.join(tdir, u'metadata.opf'), 'wb') as mdataf:
|
||||||
|
opf = OPF(io.BytesIO(etree.tostring(oeb_book.metadata.to_opf1(), encoding='UTF-8')))
|
||||||
|
mi = opf.to_book_metadata()
|
||||||
|
if cover_path:
|
||||||
|
mi.cover = u'cover.jpg'
|
||||||
|
mdataf.write(metadata_to_opf(mi))
|
||||||
|
|
||||||
|
htmlz = ZipFile(output_path, 'w')
|
||||||
|
htmlz.add_dir(tdir)
|
||||||
64
ebook_converter/ebooks/conversion/plugins/lit_input.py
Normal file
64
ebook_converter/ebooks/conversion/plugins/lit_input.py
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
|
||||||
|
|
||||||
|
class LITInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'LIT Input'
|
||||||
|
author = 'Marshall T. Vandegrift'
|
||||||
|
description = 'Convert LIT files to HTML'
|
||||||
|
file_types = {'lit'}
|
||||||
|
commit_name = 'lit_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.lit.reader import LitReader
|
||||||
|
from calibre.ebooks.conversion.plumber import create_oebbook
|
||||||
|
self.log = log
|
||||||
|
return create_oebbook(log, stream, options, reader=LitReader)
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
from calibre.ebooks.oeb.base import XHTML_NS, XPath, XHTML
|
||||||
|
for item in oeb.spine:
|
||||||
|
root = item.data
|
||||||
|
if not hasattr(root, 'xpath'):
|
||||||
|
continue
|
||||||
|
for bad in ('metadata', 'guide'):
|
||||||
|
metadata = XPath('//h:'+bad)(root)
|
||||||
|
if metadata:
|
||||||
|
for x in metadata:
|
||||||
|
x.getparent().remove(x)
|
||||||
|
body = XPath('//h:body')(root)
|
||||||
|
if body:
|
||||||
|
body = body[0]
|
||||||
|
if len(body) == 1 and body[0].tag == XHTML('pre'):
|
||||||
|
pre = body[0]
|
||||||
|
from calibre.ebooks.txt.processor import convert_basic, \
|
||||||
|
separate_paragraphs_single_line
|
||||||
|
from calibre.ebooks.chardet import xml_to_unicode
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
import copy
|
||||||
|
self.log('LIT file with all text in singe <pre> tag detected')
|
||||||
|
html = separate_paragraphs_single_line(pre.text)
|
||||||
|
html = convert_basic(html).replace('<html>',
|
||||||
|
'<html xmlns="%s">'%XHTML_NS)
|
||||||
|
html = xml_to_unicode(html, strip_encoding_pats=True,
|
||||||
|
resolve_entities=True)[0]
|
||||||
|
if opts.smarten_punctuation:
|
||||||
|
# SmartyPants skips text inside <pre> tags
|
||||||
|
from calibre.ebooks.conversion.preprocess import smarten_punctuation
|
||||||
|
html = smarten_punctuation(html, self.log)
|
||||||
|
root = safe_xml_fromstring(html)
|
||||||
|
body = XPath('//h:body')(root)
|
||||||
|
pre.tag = XHTML('div')
|
||||||
|
pre.text = ''
|
||||||
|
for elem in body:
|
||||||
|
ne = copy.deepcopy(elem)
|
||||||
|
pre.append(ne)
|
||||||
38
ebook_converter/ebooks/conversion/plugins/lit_output.py
Normal file
38
ebook_converter/ebooks/conversion/plugins/lit_output.py
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin
|
||||||
|
|
||||||
|
|
||||||
|
class LITOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'LIT Output'
|
||||||
|
author = 'Marshall T. Vandegrift'
|
||||||
|
file_type = 'lit'
|
||||||
|
commit_name = 'lit_output'
|
||||||
|
|
||||||
|
def convert(self, oeb, output_path, input_plugin, opts, log):
|
||||||
|
self.log, self.opts, self.oeb = log, opts, oeb
|
||||||
|
from calibre.ebooks.oeb.transforms.manglecase import CaseMangler
|
||||||
|
from calibre.ebooks.oeb.transforms.rasterize import SVGRasterizer
|
||||||
|
from calibre.ebooks.oeb.transforms.htmltoc import HTMLTOCAdder
|
||||||
|
from calibre.ebooks.lit.writer import LitWriter
|
||||||
|
from calibre.ebooks.oeb.transforms.split import Split
|
||||||
|
split = Split(split_on_page_breaks=True, max_flow_size=0,
|
||||||
|
remove_css_pagebreaks=False)
|
||||||
|
split(self.oeb, self.opts)
|
||||||
|
|
||||||
|
tocadder = HTMLTOCAdder()
|
||||||
|
tocadder(oeb, opts)
|
||||||
|
mangler = CaseMangler()
|
||||||
|
mangler(oeb, opts)
|
||||||
|
rasterizer = SVGRasterizer()
|
||||||
|
rasterizer(oeb, opts)
|
||||||
|
lit = LitWriter(self.opts)
|
||||||
|
lit(oeb, output_path)
|
||||||
82
ebook_converter/ebooks/conversion/plugins/lrf_input.py
Normal file
82
ebook_converter/ebooks/conversion/plugins/lrf_input.py
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, sys
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
|
||||||
|
|
||||||
|
class LRFInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'LRF Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Convert LRF files to HTML'
|
||||||
|
file_types = {'lrf'}
|
||||||
|
commit_name = 'lrf_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.lrf.input import (MediaType, Styles, TextBlock,
|
||||||
|
Canvas, ImageBlock, RuledLine)
|
||||||
|
self.log = log
|
||||||
|
self.log('Generating XML')
|
||||||
|
from calibre.ebooks.lrf.lrfparser import LRFDocument
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
from lxml import etree
|
||||||
|
d = LRFDocument(stream)
|
||||||
|
d.parse()
|
||||||
|
xml = d.to_xml(write_files=True)
|
||||||
|
if options.verbose > 2:
|
||||||
|
open(u'lrs.xml', 'wb').write(xml.encode('utf-8'))
|
||||||
|
doc = safe_xml_fromstring(xml)
|
||||||
|
|
||||||
|
char_button_map = {}
|
||||||
|
for x in doc.xpath('//CharButton[@refobj]'):
|
||||||
|
ro = x.get('refobj')
|
||||||
|
jump_button = doc.xpath('//*[@objid="%s"]'%ro)
|
||||||
|
if jump_button:
|
||||||
|
jump_to = jump_button[0].xpath('descendant::JumpTo[@refpage and @refobj]')
|
||||||
|
if jump_to:
|
||||||
|
char_button_map[ro] = '%s.xhtml#%s'%(jump_to[0].get('refpage'),
|
||||||
|
jump_to[0].get('refobj'))
|
||||||
|
plot_map = {}
|
||||||
|
for x in doc.xpath('//Plot[@refobj]'):
|
||||||
|
ro = x.get('refobj')
|
||||||
|
image = doc.xpath('//Image[@objid="%s" and @refstream]'%ro)
|
||||||
|
if image:
|
||||||
|
imgstr = doc.xpath('//ImageStream[@objid="%s" and @file]'%
|
||||||
|
image[0].get('refstream'))
|
||||||
|
if imgstr:
|
||||||
|
plot_map[ro] = imgstr[0].get('file')
|
||||||
|
|
||||||
|
self.log('Converting XML to HTML...')
|
||||||
|
styledoc = safe_xml_fromstring(P('templates/lrf.xsl', data=True))
|
||||||
|
media_type = MediaType()
|
||||||
|
styles = Styles()
|
||||||
|
text_block = TextBlock(styles, char_button_map, plot_map, log)
|
||||||
|
canvas = Canvas(doc, styles, text_block, log)
|
||||||
|
image_block = ImageBlock(canvas)
|
||||||
|
ruled_line = RuledLine()
|
||||||
|
extensions = {
|
||||||
|
('calibre', 'media-type') : media_type,
|
||||||
|
('calibre', 'text-block') : text_block,
|
||||||
|
('calibre', 'ruled-line') : ruled_line,
|
||||||
|
('calibre', 'styles') : styles,
|
||||||
|
('calibre', 'canvas') : canvas,
|
||||||
|
('calibre', 'image-block'): image_block,
|
||||||
|
}
|
||||||
|
transform = etree.XSLT(styledoc, extensions=extensions)
|
||||||
|
try:
|
||||||
|
result = transform(doc)
|
||||||
|
except RuntimeError:
|
||||||
|
sys.setrecursionlimit(5000)
|
||||||
|
result = transform(doc)
|
||||||
|
|
||||||
|
with open('content.opf', 'wb') as f:
|
||||||
|
f.write(result)
|
||||||
|
styles.write()
|
||||||
|
return os.path.abspath('content.opf')
|
||||||
196
ebook_converter/ebooks/conversion/plugins/lrf_output.py
Normal file
196
ebook_converter/ebooks/conversion/plugins/lrf_output.py
Normal file
@@ -0,0 +1,196 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import sys, os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin
|
||||||
|
from calibre.customize.conversion import OptionRecommendation
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class LRFOptions(object):
|
||||||
|
|
||||||
|
def __init__(self, output, opts, oeb):
|
||||||
|
def f2s(f):
|
||||||
|
try:
|
||||||
|
return unicode_type(f[0])
|
||||||
|
except:
|
||||||
|
return ''
|
||||||
|
m = oeb.metadata
|
||||||
|
for x in ('left', 'top', 'right', 'bottom'):
|
||||||
|
attr = 'margin_'+x
|
||||||
|
val = getattr(opts, attr)
|
||||||
|
if val < 0:
|
||||||
|
setattr(opts, attr, 0)
|
||||||
|
self.title = None
|
||||||
|
self.author = self.publisher = _('Unknown')
|
||||||
|
self.title_sort = self.author_sort = ''
|
||||||
|
for x in m.creator:
|
||||||
|
if x.role == 'aut':
|
||||||
|
self.author = unicode_type(x)
|
||||||
|
fa = unicode_type(getattr(x, 'file_as', ''))
|
||||||
|
if fa:
|
||||||
|
self.author_sort = fa
|
||||||
|
for x in m.title:
|
||||||
|
if unicode_type(x.file_as):
|
||||||
|
self.title_sort = unicode_type(x.file_as)
|
||||||
|
self.freetext = f2s(m.description)
|
||||||
|
self.category = f2s(m.subject)
|
||||||
|
self.cover = None
|
||||||
|
self.use_metadata_cover = True
|
||||||
|
self.output = output
|
||||||
|
self.ignore_tables = opts.linearize_tables
|
||||||
|
if opts.disable_font_rescaling:
|
||||||
|
self.base_font_size = 0
|
||||||
|
else:
|
||||||
|
self.base_font_size = opts.base_font_size
|
||||||
|
self.blank_after_para = opts.insert_blank_line
|
||||||
|
self.use_spine = True
|
||||||
|
self.font_delta = 0
|
||||||
|
self.ignore_colors = False
|
||||||
|
from calibre.ebooks.lrf import PRS500_PROFILE
|
||||||
|
self.profile = PRS500_PROFILE
|
||||||
|
self.link_levels = sys.maxsize
|
||||||
|
self.link_exclude = '@'
|
||||||
|
self.no_links_in_toc = True
|
||||||
|
self.disable_chapter_detection = True
|
||||||
|
self.chapter_regex = 'dsadcdswcdec'
|
||||||
|
self.chapter_attr = '$,,$'
|
||||||
|
self.override_css = self._override_css = ''
|
||||||
|
self.page_break = 'h[12]'
|
||||||
|
self.force_page_break = '$'
|
||||||
|
self.force_page_break_attr = '$'
|
||||||
|
self.add_chapters_to_toc = False
|
||||||
|
self.baen = self.pdftohtml = self.book_designer = False
|
||||||
|
self.verbose = opts.verbose
|
||||||
|
self.encoding = 'utf-8'
|
||||||
|
self.lrs = False
|
||||||
|
self.minimize_memory_usage = False
|
||||||
|
self.autorotation = opts.enable_autorotation
|
||||||
|
self.header_separation = (self.profile.dpi/72.) * opts.header_separation
|
||||||
|
self.headerformat = opts.header_format
|
||||||
|
|
||||||
|
for x in ('top', 'bottom', 'left', 'right'):
|
||||||
|
setattr(self, x+'_margin',
|
||||||
|
(self.profile.dpi/72.) * float(getattr(opts, 'margin_'+x)))
|
||||||
|
|
||||||
|
for x in ('wordspace', 'header', 'header_format',
|
||||||
|
'minimum_indent', 'serif_family',
|
||||||
|
'render_tables_as_images', 'sans_family', 'mono_family',
|
||||||
|
'text_size_multiplier_for_rendered_tables'):
|
||||||
|
setattr(self, x, getattr(opts, x))
|
||||||
|
|
||||||
|
|
||||||
|
class LRFOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'LRF Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'lrf'
|
||||||
|
commit_name = 'lrf_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='enable_autorotation', recommended_value=False,
|
||||||
|
help=_('Enable auto-rotation of images that are wider than the screen width.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='wordspace',
|
||||||
|
recommended_value=2.5, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Set the space between words in pts. Default is %default')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='header', recommended_value=False,
|
||||||
|
help=_('Add a header to all the pages with title and author.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='header_format', recommended_value="%t by %a",
|
||||||
|
help=_('Set the format of the header. %a is replaced by the author '
|
||||||
|
'and %t by the title. Default is %default')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='header_separation', recommended_value=0,
|
||||||
|
help=_('Add extra spacing below the header. Default is %default pt.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='minimum_indent', recommended_value=0,
|
||||||
|
help=_('Minimum paragraph indent (the indent of the first line '
|
||||||
|
'of a paragraph) in pts. Default: %default')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='render_tables_as_images',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('This option has no effect')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='text_size_multiplier_for_rendered_tables',
|
||||||
|
recommended_value=1.0,
|
||||||
|
help=_('Multiply the size of text in rendered tables by this '
|
||||||
|
'factor. Default is %default')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='serif_family', recommended_value=None,
|
||||||
|
help=_('The serif family of fonts to embed')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='sans_family', recommended_value=None,
|
||||||
|
help=_('The sans-serif family of fonts to embed')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='mono_family', recommended_value=None,
|
||||||
|
help=_('The monospace family of fonts to embed')
|
||||||
|
),
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
recommendations = {
|
||||||
|
('change_justification', 'original', OptionRecommendation.HIGH)}
|
||||||
|
|
||||||
|
def convert_images(self, pages, opts, wide):
|
||||||
|
from calibre.ebooks.lrf.pylrs.pylrs import Book, BookSetting, ImageStream, ImageBlock
|
||||||
|
from uuid import uuid4
|
||||||
|
from calibre.constants import __appname__, __version__
|
||||||
|
|
||||||
|
width, height = (784, 1012) if wide else (584, 754)
|
||||||
|
|
||||||
|
ps = {}
|
||||||
|
ps['topmargin'] = 0
|
||||||
|
ps['evensidemargin'] = 0
|
||||||
|
ps['oddsidemargin'] = 0
|
||||||
|
ps['textwidth'] = width
|
||||||
|
ps['textheight'] = height
|
||||||
|
book = Book(title=opts.title, author=opts.author,
|
||||||
|
bookid=uuid4().hex,
|
||||||
|
publisher='%s %s'%(__appname__, __version__),
|
||||||
|
category=_('Comic'), pagestyledefault=ps,
|
||||||
|
booksetting=BookSetting(screenwidth=width, screenheight=height))
|
||||||
|
for page in pages:
|
||||||
|
imageStream = ImageStream(page)
|
||||||
|
_page = book.create_page()
|
||||||
|
_page.append(ImageBlock(refstream=imageStream,
|
||||||
|
blockwidth=width, blockheight=height, xsize=width,
|
||||||
|
ysize=height, x1=width, y1=height))
|
||||||
|
book.append(_page)
|
||||||
|
|
||||||
|
book.renderLrf(open(opts.output, 'wb'))
|
||||||
|
|
||||||
|
def flatten_toc(self):
|
||||||
|
from calibre.ebooks.oeb.base import TOC
|
||||||
|
nroot = TOC()
|
||||||
|
for x in self.oeb.toc.iterdescendants():
|
||||||
|
nroot.add(x.title, x.href)
|
||||||
|
self.oeb.toc = nroot
|
||||||
|
|
||||||
|
def convert(self, oeb, output_path, input_plugin, opts, log):
|
||||||
|
self.log, self.opts, self.oeb = log, opts, oeb
|
||||||
|
|
||||||
|
lrf_opts = LRFOptions(output_path, opts, oeb)
|
||||||
|
|
||||||
|
if input_plugin.is_image_collection:
|
||||||
|
self.convert_images(input_plugin.get_images(), lrf_opts,
|
||||||
|
getattr(opts, 'wide', False))
|
||||||
|
return
|
||||||
|
|
||||||
|
self.flatten_toc()
|
||||||
|
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
with TemporaryDirectory('_lrf_output') as tdir:
|
||||||
|
from calibre.customize.ui import plugin_for_output_format
|
||||||
|
oeb_output = plugin_for_output_format('oeb')
|
||||||
|
oeb_output.convert(oeb, tdir, input_plugin, opts, log)
|
||||||
|
opf = [x for x in os.listdir(tdir) if x.endswith('.opf')][0]
|
||||||
|
from calibre.ebooks.lrf.html.convert_from import process_file
|
||||||
|
process_file(os.path.join(tdir, opf), lrf_opts, self.log)
|
||||||
66
ebook_converter/ebooks/conversion/plugins/mobi_input.py
Normal file
66
ebook_converter/ebooks/conversion/plugins/mobi_input.py
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class MOBIInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'MOBI Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Convert MOBI files (.mobi, .prc, .azw) to HTML'
|
||||||
|
file_types = {'mobi', 'prc', 'azw', 'azw3', 'pobi'}
|
||||||
|
commit_name = 'mobi_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
self.is_kf8 = False
|
||||||
|
self.mobi_is_joint = False
|
||||||
|
|
||||||
|
from calibre.ebooks.mobi.reader.mobi6 import MobiReader
|
||||||
|
from lxml import html
|
||||||
|
parse_cache = {}
|
||||||
|
try:
|
||||||
|
mr = MobiReader(stream, log, options.input_encoding,
|
||||||
|
options.debug_pipeline)
|
||||||
|
if mr.kf8_type is None:
|
||||||
|
mr.extract_content('.', parse_cache)
|
||||||
|
|
||||||
|
except:
|
||||||
|
mr = MobiReader(stream, log, options.input_encoding,
|
||||||
|
options.debug_pipeline, try_extra_data_fix=True)
|
||||||
|
if mr.kf8_type is None:
|
||||||
|
mr.extract_content('.', parse_cache)
|
||||||
|
|
||||||
|
if mr.kf8_type is not None:
|
||||||
|
log('Found KF8 MOBI of type %r'%mr.kf8_type)
|
||||||
|
if mr.kf8_type == 'joint':
|
||||||
|
self.mobi_is_joint = True
|
||||||
|
from calibre.ebooks.mobi.reader.mobi8 import Mobi8Reader
|
||||||
|
mr = Mobi8Reader(mr, log)
|
||||||
|
opf = os.path.abspath(mr())
|
||||||
|
self.encrypted_fonts = mr.encrypted_fonts
|
||||||
|
self.is_kf8 = True
|
||||||
|
return opf
|
||||||
|
|
||||||
|
raw = parse_cache.pop('calibre_raw_mobi_markup', False)
|
||||||
|
if raw:
|
||||||
|
if isinstance(raw, unicode_type):
|
||||||
|
raw = raw.encode('utf-8')
|
||||||
|
with lopen('debug-raw.html', 'wb') as f:
|
||||||
|
f.write(raw)
|
||||||
|
from calibre.ebooks.oeb.base import close_self_closing_tags
|
||||||
|
for f, root in parse_cache.items():
|
||||||
|
raw = html.tostring(root, encoding='utf-8', method='xml',
|
||||||
|
include_meta_content_type=False)
|
||||||
|
raw = close_self_closing_tags(raw)
|
||||||
|
with lopen(f, 'wb') as q:
|
||||||
|
q.write(raw)
|
||||||
|
accelerators['pagebreaks'] = '//h:div[@class="mbp_pagebreak"]'
|
||||||
|
return mr.created_opf_path
|
||||||
337
ebook_converter/ebooks/conversion/plugins/mobi_output.py
Normal file
337
ebook_converter/ebooks/conversion/plugins/mobi_output.py
Normal file
@@ -0,0 +1,337 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from calibre.customize.conversion import (OutputFormatPlugin,
|
||||||
|
OptionRecommendation)
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
def remove_html_cover(oeb, log):
|
||||||
|
from calibre.ebooks.oeb.base import OEB_DOCS
|
||||||
|
|
||||||
|
if not oeb.metadata.cover \
|
||||||
|
or 'cover' not in oeb.guide:
|
||||||
|
return
|
||||||
|
href = oeb.guide['cover'].href
|
||||||
|
del oeb.guide['cover']
|
||||||
|
item = oeb.manifest.hrefs[href]
|
||||||
|
if item.spine_position is not None:
|
||||||
|
log.warn('Found an HTML cover: ', item.href, 'removing it.',
|
||||||
|
'If you find some content missing from the output MOBI, it '
|
||||||
|
'is because you misidentified the HTML cover in the input '
|
||||||
|
'document')
|
||||||
|
oeb.spine.remove(item)
|
||||||
|
if item.media_type in OEB_DOCS:
|
||||||
|
oeb.manifest.remove(item)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_mobi(output_path, opts):
|
||||||
|
if opts.extract_to is not None:
|
||||||
|
from calibre.ebooks.mobi.debug.main import inspect_mobi
|
||||||
|
ddir = opts.extract_to
|
||||||
|
inspect_mobi(output_path, ddir=ddir)
|
||||||
|
|
||||||
|
|
||||||
|
class MOBIOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'MOBI Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'mobi'
|
||||||
|
commit_name = 'mobi_output'
|
||||||
|
ui_data = {'file_types': ['old', 'both', 'new']}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='prefer_author_sort',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('When present, use author sort field as author.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='no_inline_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Don\'t add Table of Contents to the book. Useful if '
|
||||||
|
'the book has its own table of contents.')),
|
||||||
|
OptionRecommendation(name='toc_title', recommended_value=None,
|
||||||
|
help=_('Title for any generated in-line table of contents.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='dont_compress',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Disable compression of the file contents.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='personal_doc', recommended_value='[PDOC]',
|
||||||
|
help=_('Tag for MOBI files to be marked as personal documents.'
|
||||||
|
' This option has no effect on the conversion. It is used'
|
||||||
|
' only when sending MOBI files to a device. If the file'
|
||||||
|
' being sent has the specified tag, it will be marked as'
|
||||||
|
' a personal document when sent to the Kindle.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='mobi_ignore_margins',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('Ignore margins in the input document. If False, then '
|
||||||
|
'the MOBI output plugin will try to convert margins specified'
|
||||||
|
' in the input document, otherwise it will ignore them.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='mobi_toc_at_start',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('When adding the Table of Contents to the book, add it at the start of the '
|
||||||
|
'book instead of the end. Not recommended.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='extract_to',
|
||||||
|
help=_('Extract the contents of the generated %s file to the '
|
||||||
|
'specified directory. The contents of the directory are first '
|
||||||
|
'deleted, so be careful.') % 'MOBI'
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='share_not_sync', recommended_value=False,
|
||||||
|
help=_('Enable sharing of book content via Facebook etc. '
|
||||||
|
' on the Kindle. WARNING: Using this feature means that '
|
||||||
|
' the book will not auto sync its last read position '
|
||||||
|
' on multiple devices. Complain to Amazon.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='mobi_keep_original_images',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('By default calibre converts all images to JPEG format '
|
||||||
|
'in the output MOBI file. This is for maximum compatibility '
|
||||||
|
'as some older MOBI viewers have problems with other image '
|
||||||
|
'formats. This option tells calibre not to do this. '
|
||||||
|
'Useful if your document contains lots of GIF/PNG images that '
|
||||||
|
'become very large when converted to JPEG.')),
|
||||||
|
OptionRecommendation(name='mobi_file_type', choices=ui_data['file_types'], recommended_value='old',
|
||||||
|
help=_('By default calibre generates MOBI files that contain the '
|
||||||
|
'old MOBI 6 format. This format is compatible with all '
|
||||||
|
'devices. However, by changing this setting, you can tell '
|
||||||
|
'calibre to generate MOBI files that contain both MOBI 6 and '
|
||||||
|
'the new KF8 format, or only the new KF8 format. KF8 has '
|
||||||
|
'more features than MOBI 6, but only works with newer Kindles. '
|
||||||
|
'Allowed values: {}').format('old, both, new')),
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
def check_for_periodical(self):
|
||||||
|
if self.is_periodical:
|
||||||
|
self.periodicalize_toc()
|
||||||
|
self.check_for_masthead()
|
||||||
|
self.opts.mobi_periodical = True
|
||||||
|
else:
|
||||||
|
self.opts.mobi_periodical = False
|
||||||
|
|
||||||
|
def check_for_masthead(self):
|
||||||
|
found = 'masthead' in self.oeb.guide
|
||||||
|
if not found:
|
||||||
|
from calibre.ebooks import generate_masthead
|
||||||
|
self.oeb.log.debug('No masthead found in manifest, generating default mastheadImage...')
|
||||||
|
raw = generate_masthead(unicode_type(self.oeb.metadata['title'][0]))
|
||||||
|
id, href = self.oeb.manifest.generate('masthead', 'masthead')
|
||||||
|
self.oeb.manifest.add(id, href, 'image/gif', data=raw)
|
||||||
|
self.oeb.guide.add('masthead', 'Masthead Image', href)
|
||||||
|
else:
|
||||||
|
self.oeb.log.debug('Using mastheadImage supplied in manifest...')
|
||||||
|
|
||||||
|
def periodicalize_toc(self):
|
||||||
|
from calibre.ebooks.oeb.base import TOC
|
||||||
|
toc = self.oeb.toc
|
||||||
|
if not toc or len(self.oeb.spine) < 3:
|
||||||
|
return
|
||||||
|
if toc and toc[0].klass != 'periodical':
|
||||||
|
one, two = self.oeb.spine[0], self.oeb.spine[1]
|
||||||
|
self.log('Converting TOC for MOBI periodical indexing...')
|
||||||
|
|
||||||
|
articles = {}
|
||||||
|
if toc.depth() < 3:
|
||||||
|
# single section periodical
|
||||||
|
self.oeb.manifest.remove(one)
|
||||||
|
self.oeb.manifest.remove(two)
|
||||||
|
sections = [TOC(klass='section', title=_('All articles'),
|
||||||
|
href=self.oeb.spine[0].href)]
|
||||||
|
for x in toc:
|
||||||
|
sections[0].nodes.append(x)
|
||||||
|
else:
|
||||||
|
# multi-section periodical
|
||||||
|
self.oeb.manifest.remove(one)
|
||||||
|
sections = list(toc)
|
||||||
|
for i,x in enumerate(sections):
|
||||||
|
x.klass = 'section'
|
||||||
|
articles_ = list(x)
|
||||||
|
if articles_:
|
||||||
|
self.oeb.manifest.remove(self.oeb.manifest.hrefs[x.href])
|
||||||
|
x.href = articles_[0].href
|
||||||
|
|
||||||
|
for sec in sections:
|
||||||
|
articles[id(sec)] = []
|
||||||
|
for a in list(sec):
|
||||||
|
a.klass = 'article'
|
||||||
|
articles[id(sec)].append(a)
|
||||||
|
sec.nodes.remove(a)
|
||||||
|
|
||||||
|
root = TOC(klass='periodical', href=self.oeb.spine[0].href,
|
||||||
|
title=unicode_type(self.oeb.metadata.title[0]))
|
||||||
|
|
||||||
|
for s in sections:
|
||||||
|
if articles[id(s)]:
|
||||||
|
for a in articles[id(s)]:
|
||||||
|
s.nodes.append(a)
|
||||||
|
root.nodes.append(s)
|
||||||
|
|
||||||
|
for x in list(toc.nodes):
|
||||||
|
toc.nodes.remove(x)
|
||||||
|
|
||||||
|
toc.nodes.append(root)
|
||||||
|
|
||||||
|
# Fix up the periodical href to point to first section href
|
||||||
|
toc.nodes[0].href = toc.nodes[0].nodes[0].href
|
||||||
|
|
||||||
|
def convert(self, oeb, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.mobi.writer2.resources import Resources
|
||||||
|
self.log, self.opts, self.oeb = log, opts, oeb
|
||||||
|
|
||||||
|
mobi_type = opts.mobi_file_type
|
||||||
|
if self.is_periodical:
|
||||||
|
mobi_type = 'old' # Amazon does not support KF8 periodicals
|
||||||
|
create_kf8 = mobi_type in ('new', 'both')
|
||||||
|
|
||||||
|
remove_html_cover(self.oeb, self.log)
|
||||||
|
resources = Resources(oeb, opts, self.is_periodical,
|
||||||
|
add_fonts=create_kf8)
|
||||||
|
self.check_for_periodical()
|
||||||
|
|
||||||
|
if create_kf8:
|
||||||
|
from calibre.ebooks.mobi.writer8.cleanup import remove_duplicate_anchors
|
||||||
|
remove_duplicate_anchors(self.oeb)
|
||||||
|
# Split on pagebreaks so that the resulting KF8 is faster to load
|
||||||
|
from calibre.ebooks.oeb.transforms.split import Split
|
||||||
|
Split()(self.oeb, self.opts)
|
||||||
|
|
||||||
|
kf8 = self.create_kf8(resources, for_joint=mobi_type=='both'
|
||||||
|
) if create_kf8 else None
|
||||||
|
if mobi_type == 'new':
|
||||||
|
kf8.write(output_path)
|
||||||
|
extract_mobi(output_path, opts)
|
||||||
|
return
|
||||||
|
|
||||||
|
self.log('Creating MOBI 6 output')
|
||||||
|
self.write_mobi(input_plugin, output_path, kf8, resources)
|
||||||
|
|
||||||
|
def create_kf8(self, resources, for_joint=False):
|
||||||
|
from calibre.ebooks.mobi.writer8.main import create_kf8_book
|
||||||
|
return create_kf8_book(self.oeb, self.opts, resources,
|
||||||
|
for_joint=for_joint)
|
||||||
|
|
||||||
|
def write_mobi(self, input_plugin, output_path, kf8, resources):
|
||||||
|
from calibre.ebooks.mobi.mobiml import MobiMLizer
|
||||||
|
from calibre.ebooks.oeb.transforms.manglecase import CaseMangler
|
||||||
|
from calibre.ebooks.oeb.transforms.rasterize import SVGRasterizer, Unavailable
|
||||||
|
from calibre.ebooks.oeb.transforms.htmltoc import HTMLTOCAdder
|
||||||
|
from calibre.customize.ui import plugin_for_input_format
|
||||||
|
|
||||||
|
opts, oeb = self.opts, self.oeb
|
||||||
|
if not opts.no_inline_toc:
|
||||||
|
tocadder = HTMLTOCAdder(title=opts.toc_title, position='start' if
|
||||||
|
opts.mobi_toc_at_start else 'end')
|
||||||
|
tocadder(oeb, opts)
|
||||||
|
mangler = CaseMangler()
|
||||||
|
mangler(oeb, opts)
|
||||||
|
try:
|
||||||
|
rasterizer = SVGRasterizer()
|
||||||
|
rasterizer(oeb, opts)
|
||||||
|
except Unavailable:
|
||||||
|
self.log.warn('SVG rasterizer unavailable, SVG will not be converted')
|
||||||
|
else:
|
||||||
|
# Add rasterized SVG images
|
||||||
|
resources.add_extra_images()
|
||||||
|
if hasattr(self.oeb, 'inserted_metadata_jacket'):
|
||||||
|
self.workaround_fire_bugs(self.oeb.inserted_metadata_jacket)
|
||||||
|
mobimlizer = MobiMLizer(ignore_tables=opts.linearize_tables)
|
||||||
|
mobimlizer(oeb, opts)
|
||||||
|
write_page_breaks_after_item = input_plugin is not plugin_for_input_format('cbz')
|
||||||
|
from calibre.ebooks.mobi.writer2.main import MobiWriter
|
||||||
|
writer = MobiWriter(opts, resources, kf8,
|
||||||
|
write_page_breaks_after_item=write_page_breaks_after_item)
|
||||||
|
writer(oeb, output_path)
|
||||||
|
extract_mobi(output_path, opts)
|
||||||
|
|
||||||
|
def specialize_css_for_output(self, log, opts, item, stylizer):
|
||||||
|
from calibre.ebooks.mobi.writer8.cleanup import CSSCleanup
|
||||||
|
CSSCleanup(log, opts)(item, stylizer)
|
||||||
|
|
||||||
|
def workaround_fire_bugs(self, jacket):
|
||||||
|
# The idiotic Fire crashes when trying to render the table used to
|
||||||
|
# layout the jacket
|
||||||
|
from calibre.ebooks.oeb.base import XHTML
|
||||||
|
for table in jacket.data.xpath('//*[local-name()="table"]'):
|
||||||
|
table.tag = XHTML('div')
|
||||||
|
for tr in table.xpath('descendant::*[local-name()="tr"]'):
|
||||||
|
cols = tr.xpath('descendant::*[local-name()="td"]')
|
||||||
|
tr.tag = XHTML('div')
|
||||||
|
for td in cols:
|
||||||
|
td.tag = XHTML('span' if cols else 'div')
|
||||||
|
|
||||||
|
|
||||||
|
class AZW3Output(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'AZW3 Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'azw3'
|
||||||
|
commit_name = 'azw3_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='prefer_author_sort',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('When present, use author sort field as author.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='no_inline_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Don\'t add Table of Contents to the book. Useful if '
|
||||||
|
'the book has its own table of contents.')),
|
||||||
|
OptionRecommendation(name='toc_title', recommended_value=None,
|
||||||
|
help=_('Title for any generated in-line table of contents.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='dont_compress',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Disable compression of the file contents.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='mobi_toc_at_start',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('When adding the Table of Contents to the book, add it at the start of the '
|
||||||
|
'book instead of the end. Not recommended.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='extract_to',
|
||||||
|
help=_('Extract the contents of the generated %s file to the '
|
||||||
|
'specified directory. The contents of the directory are first '
|
||||||
|
'deleted, so be careful.') % 'AZW3'),
|
||||||
|
OptionRecommendation(name='share_not_sync', recommended_value=False,
|
||||||
|
help=_('Enable sharing of book content via Facebook etc. '
|
||||||
|
' on the Kindle. WARNING: Using this feature means that '
|
||||||
|
' the book will not auto sync its last read position '
|
||||||
|
' on multiple devices. Complain to Amazon.')
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.mobi.writer2.resources import Resources
|
||||||
|
from calibre.ebooks.mobi.writer8.main import create_kf8_book
|
||||||
|
from calibre.ebooks.mobi.writer8.cleanup import remove_duplicate_anchors
|
||||||
|
|
||||||
|
self.oeb, self.opts, self.log = oeb, opts, log
|
||||||
|
opts.mobi_periodical = self.is_periodical
|
||||||
|
passthrough = getattr(opts, 'mobi_passthrough', False)
|
||||||
|
remove_duplicate_anchors(oeb)
|
||||||
|
|
||||||
|
resources = Resources(self.oeb, self.opts, self.is_periodical,
|
||||||
|
add_fonts=True, process_images=False)
|
||||||
|
if not passthrough:
|
||||||
|
remove_html_cover(self.oeb, self.log)
|
||||||
|
|
||||||
|
# Split on pagebreaks so that the resulting KF8 is faster to load
|
||||||
|
from calibre.ebooks.oeb.transforms.split import Split
|
||||||
|
Split()(self.oeb, self.opts)
|
||||||
|
|
||||||
|
kf8 = create_kf8_book(self.oeb, self.opts, resources, for_joint=False)
|
||||||
|
|
||||||
|
kf8.write(output_path)
|
||||||
|
extract_mobi(output_path, opts)
|
||||||
|
|
||||||
|
def specialize_css_for_output(self, log, opts, item, stylizer):
|
||||||
|
from calibre.ebooks.mobi.writer8.cleanup import CSSCleanup
|
||||||
|
CSSCleanup(log, opts)(item, stylizer)
|
||||||
25
ebook_converter/ebooks/conversion/plugins/odt_input.py
Normal file
25
ebook_converter/ebooks/conversion/plugins/odt_input.py
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
'''
|
||||||
|
Convert an ODT file into a Open Ebook
|
||||||
|
'''
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
|
||||||
|
|
||||||
|
class ODTInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'ODT Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Convert ODT (OpenOffice) files to HTML'
|
||||||
|
file_types = {'odt'}
|
||||||
|
commit_name = 'odt_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.odt.input import Extract
|
||||||
|
return Extract()(stream, '.', log)
|
||||||
122
ebook_converter/ebooks/conversion/plugins/oeb_output.py
Normal file
122
ebook_converter/ebooks/conversion/plugins/oeb_output.py
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, re
|
||||||
|
|
||||||
|
|
||||||
|
from calibre.customize.conversion import (OutputFormatPlugin,
|
||||||
|
OptionRecommendation)
|
||||||
|
from calibre import CurrentDir
|
||||||
|
|
||||||
|
|
||||||
|
class OEBOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'OEB Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'oeb'
|
||||||
|
commit_name = 'oeb_output'
|
||||||
|
|
||||||
|
recommendations = {('pretty_print', True, OptionRecommendation.HIGH)}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from polyglot.urllib import unquote
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
self.log, self.opts = log, opts
|
||||||
|
if not os.path.exists(output_path):
|
||||||
|
os.makedirs(output_path)
|
||||||
|
from calibre.ebooks.oeb.base import OPF_MIME, NCX_MIME, PAGE_MAP_MIME, OEB_STYLES
|
||||||
|
from calibre.ebooks.oeb.normalize_css import condense_sheet
|
||||||
|
with CurrentDir(output_path):
|
||||||
|
results = oeb_book.to_opf2(page_map=True)
|
||||||
|
for key in (OPF_MIME, NCX_MIME, PAGE_MAP_MIME):
|
||||||
|
href, root = results.pop(key, [None, None])
|
||||||
|
if root is not None:
|
||||||
|
if key == OPF_MIME:
|
||||||
|
try:
|
||||||
|
self.workaround_nook_cover_bug(root)
|
||||||
|
except:
|
||||||
|
self.log.exception('Something went wrong while trying to'
|
||||||
|
' workaround Nook cover bug, ignoring')
|
||||||
|
try:
|
||||||
|
self.workaround_pocketbook_cover_bug(root)
|
||||||
|
except:
|
||||||
|
self.log.exception('Something went wrong while trying to'
|
||||||
|
' workaround Pocketbook cover bug, ignoring')
|
||||||
|
self.migrate_lang_code(root)
|
||||||
|
raw = etree.tostring(root, pretty_print=True,
|
||||||
|
encoding='utf-8', xml_declaration=True)
|
||||||
|
if key == OPF_MIME:
|
||||||
|
# Needed as I can't get lxml to output opf:role and
|
||||||
|
# not output <opf:metadata> as well
|
||||||
|
raw = re.sub(br'(<[/]{0,1})opf:', br'\1', raw)
|
||||||
|
with lopen(href, 'wb') as f:
|
||||||
|
f.write(raw)
|
||||||
|
|
||||||
|
for item in oeb_book.manifest:
|
||||||
|
if (
|
||||||
|
not self.opts.expand_css and item.media_type in OEB_STYLES and hasattr(
|
||||||
|
item.data, 'cssText') and 'nook' not in self.opts.output_profile.short_name):
|
||||||
|
condense_sheet(item.data)
|
||||||
|
path = os.path.abspath(unquote(item.href))
|
||||||
|
dir = os.path.dirname(path)
|
||||||
|
if not os.path.exists(dir):
|
||||||
|
os.makedirs(dir)
|
||||||
|
with lopen(path, 'wb') as f:
|
||||||
|
f.write(item.bytes_representation)
|
||||||
|
item.unload_data_from_memory(memory=path)
|
||||||
|
|
||||||
|
def workaround_nook_cover_bug(self, root): # {{{
|
||||||
|
cov = root.xpath('//*[local-name() = "meta" and @name="cover" and'
|
||||||
|
' @content != "cover"]')
|
||||||
|
|
||||||
|
def manifest_items_with_id(id_):
|
||||||
|
return root.xpath('//*[local-name() = "manifest"]/*[local-name() = "item" '
|
||||||
|
' and @id="%s"]'%id_)
|
||||||
|
|
||||||
|
if len(cov) == 1:
|
||||||
|
cov = cov[0]
|
||||||
|
covid = cov.get('content', '')
|
||||||
|
|
||||||
|
if covid:
|
||||||
|
manifest_item = manifest_items_with_id(covid)
|
||||||
|
if len(manifest_item) == 1 and \
|
||||||
|
manifest_item[0].get('media-type',
|
||||||
|
'').startswith('image/'):
|
||||||
|
self.log.warn('The cover image has an id != "cover". Renaming'
|
||||||
|
' to work around bug in Nook Color')
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.base import uuid_id
|
||||||
|
newid = uuid_id()
|
||||||
|
|
||||||
|
for item in manifest_items_with_id('cover'):
|
||||||
|
item.set('id', newid)
|
||||||
|
|
||||||
|
for x in root.xpath('//*[@idref="cover"]'):
|
||||||
|
x.set('idref', newid)
|
||||||
|
|
||||||
|
manifest_item = manifest_item[0]
|
||||||
|
manifest_item.set('id', 'cover')
|
||||||
|
cov.set('content', 'cover')
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def workaround_pocketbook_cover_bug(self, root): # {{{
|
||||||
|
m = root.xpath('//*[local-name() = "manifest"]/*[local-name() = "item" '
|
||||||
|
' and @id="cover"]')
|
||||||
|
if len(m) == 1:
|
||||||
|
m = m[0]
|
||||||
|
p = m.getparent()
|
||||||
|
p.remove(m)
|
||||||
|
p.insert(0, m)
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
def migrate_lang_code(self, root): # {{{
|
||||||
|
from calibre.utils.localization import lang_as_iso639_1
|
||||||
|
for lang in root.xpath('//*[local-name() = "language"]'):
|
||||||
|
clc = lang_as_iso639_1(lang.text)
|
||||||
|
if clc:
|
||||||
|
lang.text = clc
|
||||||
|
# }}}
|
||||||
37
ebook_converter/ebooks/conversion/plugins/pdb_input.py
Normal file
37
ebook_converter/ebooks/conversion/plugins/pdb_input.py
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class PDBInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'PDB Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert PDB to HTML'
|
||||||
|
file_types = {'pdb', 'updb'}
|
||||||
|
commit_name = 'pdb_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.pdb.header import PdbHeaderReader
|
||||||
|
from calibre.ebooks.pdb import PDBError, IDENTITY_TO_NAME, get_reader
|
||||||
|
|
||||||
|
header = PdbHeaderReader(stream)
|
||||||
|
Reader = get_reader(header.ident)
|
||||||
|
|
||||||
|
if Reader is None:
|
||||||
|
raise PDBError('No reader available for format within container.\n Identity is %s. Book type is %s' %
|
||||||
|
(header.ident, IDENTITY_TO_NAME.get(header.ident, _('Unknown'))))
|
||||||
|
|
||||||
|
log.debug('Detected ebook format as: %s with identity: %s' % (IDENTITY_TO_NAME[header.ident], header.ident))
|
||||||
|
|
||||||
|
reader = Reader(header, stream, log, options)
|
||||||
|
opf = reader.extract_content(getcwd())
|
||||||
|
|
||||||
|
return opf
|
||||||
64
ebook_converter/ebooks/conversion/plugins/pdb_output.py
Normal file
64
ebook_converter/ebooks/conversion/plugins/pdb_output.py
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, \
|
||||||
|
OptionRecommendation
|
||||||
|
from calibre.ebooks.pdb import PDBError, get_writer, ALL_FORMAT_WRITERS
|
||||||
|
|
||||||
|
|
||||||
|
class PDBOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'PDB Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'pdb'
|
||||||
|
commit_name = 'pdb_output'
|
||||||
|
ui_data = {'formats': tuple(ALL_FORMAT_WRITERS)}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='format', recommended_value='doc',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
short_switch='f', choices=list(ALL_FORMAT_WRITERS),
|
||||||
|
help=(_('Format to use inside the pdb container. Choices are:') + ' %s' % sorted(ALL_FORMAT_WRITERS))),
|
||||||
|
OptionRecommendation(name='pdb_output_encoding', recommended_value='cp1252',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify the character encoding of the output document. '
|
||||||
|
'The default is cp1252. Note: This option is not honored by all '
|
||||||
|
'formats.')),
|
||||||
|
OptionRecommendation(name='inline_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Add Table of Contents to beginning of the book.')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
close = False
|
||||||
|
if not hasattr(output_path, 'write'):
|
||||||
|
close = True
|
||||||
|
if not os.path.exists(os.path.dirname(output_path)) and os.path.dirname(output_path):
|
||||||
|
os.makedirs(os.path.dirname(output_path))
|
||||||
|
out_stream = lopen(output_path, 'wb')
|
||||||
|
else:
|
||||||
|
out_stream = output_path
|
||||||
|
|
||||||
|
Writer = get_writer(opts.format)
|
||||||
|
|
||||||
|
if Writer is None:
|
||||||
|
raise PDBError('No writer available for format %s.' % format)
|
||||||
|
|
||||||
|
setattr(opts, 'max_line_length', 0)
|
||||||
|
setattr(opts, 'force_max_line_length', False)
|
||||||
|
|
||||||
|
writer = Writer(opts, log)
|
||||||
|
|
||||||
|
out_stream.seek(0)
|
||||||
|
out_stream.truncate()
|
||||||
|
|
||||||
|
writer.write_content(oeb_book, out_stream, oeb_book.metadata)
|
||||||
|
|
||||||
|
if close:
|
||||||
|
out_stream.close()
|
||||||
82
ebook_converter/ebooks/conversion/plugins/pdf_input.py
Normal file
82
ebook_converter/ebooks/conversion/plugins/pdf_input.py
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from polyglot.builtins import as_bytes, getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class PDFInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'PDF Input'
|
||||||
|
author = 'Kovid Goyal and John Schember'
|
||||||
|
description = 'Convert PDF files to HTML'
|
||||||
|
file_types = {'pdf'}
|
||||||
|
commit_name = 'pdf_input'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='no_images', recommended_value=False,
|
||||||
|
help=_('Do not extract images from the document')),
|
||||||
|
OptionRecommendation(name='unwrap_factor', recommended_value=0.45,
|
||||||
|
help=_('Scale used to determine the length at which a line should '
|
||||||
|
'be unwrapped. Valid values are a decimal between 0 and 1. The '
|
||||||
|
'default is 0.45, just below the median line length.')),
|
||||||
|
OptionRecommendation(name='new_pdf_engine', recommended_value=False,
|
||||||
|
help=_('Use the new PDF conversion engine. Currently not operational.'))
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert_new(self, stream, accelerators):
|
||||||
|
from calibre.ebooks.pdf.pdftohtml import pdftohtml
|
||||||
|
from calibre.utils.cleantext import clean_ascii_chars
|
||||||
|
from calibre.ebooks.pdf.reflow import PDFDocument
|
||||||
|
|
||||||
|
pdftohtml(getcwd(), stream.name, self.opts.no_images, as_xml=True)
|
||||||
|
with lopen('index.xml', 'rb') as f:
|
||||||
|
xml = clean_ascii_chars(f.read())
|
||||||
|
PDFDocument(xml, self.opts, self.log)
|
||||||
|
return os.path.join(getcwd(), 'metadata.opf')
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||||
|
from calibre.ebooks.pdf.pdftohtml import pdftohtml
|
||||||
|
|
||||||
|
log.debug('Converting file to html...')
|
||||||
|
# The main html file will be named index.html
|
||||||
|
self.opts, self.log = options, log
|
||||||
|
if options.new_pdf_engine:
|
||||||
|
return self.convert_new(stream, accelerators)
|
||||||
|
pdftohtml(getcwd(), stream.name, options.no_images)
|
||||||
|
|
||||||
|
from calibre.ebooks.metadata.meta import get_metadata
|
||||||
|
log.debug('Retrieving document metadata...')
|
||||||
|
mi = get_metadata(stream, 'pdf')
|
||||||
|
opf = OPFCreator(getcwd(), mi)
|
||||||
|
|
||||||
|
manifest = [('index.html', None)]
|
||||||
|
|
||||||
|
images = os.listdir(getcwd())
|
||||||
|
images.remove('index.html')
|
||||||
|
for i in images:
|
||||||
|
manifest.append((i, None))
|
||||||
|
log.debug('Generating manifest...')
|
||||||
|
opf.create_manifest(manifest)
|
||||||
|
|
||||||
|
opf.create_spine(['index.html'])
|
||||||
|
log.debug('Rendering manifest...')
|
||||||
|
with lopen('metadata.opf', 'wb') as opffile:
|
||||||
|
opf.render(opffile)
|
||||||
|
if os.path.exists('toc.ncx'):
|
||||||
|
ncxid = opf.manifest.id_for_path('toc.ncx')
|
||||||
|
if ncxid:
|
||||||
|
with lopen('metadata.opf', 'r+b') as f:
|
||||||
|
raw = f.read().replace(b'<spine', b'<spine toc="%s"' % as_bytes(ncxid))
|
||||||
|
f.seek(0)
|
||||||
|
f.write(raw)
|
||||||
|
|
||||||
|
return os.path.join(getcwd(), 'metadata.opf')
|
||||||
256
ebook_converter/ebooks/conversion/plugins/pdf_output.py
Normal file
256
ebook_converter/ebooks/conversion/plugins/pdf_output.py
Normal file
@@ -0,0 +1,256 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2012, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
'''
|
||||||
|
Convert OEB ebook format to PDF.
|
||||||
|
'''
|
||||||
|
|
||||||
|
import glob, os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import (OutputFormatPlugin,
|
||||||
|
OptionRecommendation)
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from polyglot.builtins import iteritems, unicode_type
|
||||||
|
|
||||||
|
UNITS = ('millimeter', 'centimeter', 'point', 'inch' , 'pica' , 'didot',
|
||||||
|
'cicero', 'devicepixel')
|
||||||
|
|
||||||
|
PAPER_SIZES = ('a0', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'b0', 'b1',
|
||||||
|
'b2', 'b3', 'b4', 'b5', 'b6', 'legal', 'letter')
|
||||||
|
|
||||||
|
|
||||||
|
class PDFOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'PDF Output'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
file_type = 'pdf'
|
||||||
|
commit_name = 'pdf_output'
|
||||||
|
ui_data = {'paper_sizes': PAPER_SIZES, 'units': UNITS, 'font_types': ('serif', 'sans', 'mono')}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='use_profile_size', recommended_value=False,
|
||||||
|
help=_('Instead of using the paper size specified in the PDF Output options,'
|
||||||
|
' use a paper size corresponding to the current output profile.'
|
||||||
|
' Useful if you want to generate a PDF for viewing on a specific device.')),
|
||||||
|
OptionRecommendation(name='unit', recommended_value='inch',
|
||||||
|
level=OptionRecommendation.LOW, short_switch='u', choices=UNITS,
|
||||||
|
help=_('The unit of measure for page sizes. Default is inch. Choices '
|
||||||
|
'are {} '
|
||||||
|
'Note: This does not override the unit for margins!').format(', '.join(UNITS))),
|
||||||
|
OptionRecommendation(name='paper_size', recommended_value='letter',
|
||||||
|
level=OptionRecommendation.LOW, choices=PAPER_SIZES,
|
||||||
|
help=_('The size of the paper. This size will be overridden when a '
|
||||||
|
'non default output profile is used. Default is letter. Choices '
|
||||||
|
'are {}').format(', '.join(PAPER_SIZES))),
|
||||||
|
OptionRecommendation(name='custom_size', recommended_value=None,
|
||||||
|
help=_('Custom size of the document. Use the form widthxheight '
|
||||||
|
'e.g. `123x321` to specify the width and height. '
|
||||||
|
'This overrides any specified paper-size.')),
|
||||||
|
OptionRecommendation(name='preserve_cover_aspect_ratio',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('Preserve the aspect ratio of the cover, instead'
|
||||||
|
' of stretching it to fill the full first page of the'
|
||||||
|
' generated pdf.')),
|
||||||
|
OptionRecommendation(name='pdf_serif_family',
|
||||||
|
recommended_value='Times', help=_(
|
||||||
|
'The font family used to render serif fonts. Will work only if the font is available system-wide.')),
|
||||||
|
OptionRecommendation(name='pdf_sans_family',
|
||||||
|
recommended_value='Helvetica', help=_(
|
||||||
|
'The font family used to render sans-serif fonts. Will work only if the font is available system-wide.')),
|
||||||
|
OptionRecommendation(name='pdf_mono_family',
|
||||||
|
recommended_value='Courier', help=_(
|
||||||
|
'The font family used to render monospace fonts. Will work only if the font is available system-wide.')),
|
||||||
|
OptionRecommendation(name='pdf_standard_font', choices=ui_data['font_types'],
|
||||||
|
recommended_value='serif', help=_(
|
||||||
|
'The font family used to render monospace fonts')),
|
||||||
|
OptionRecommendation(name='pdf_default_font_size',
|
||||||
|
recommended_value=20, help=_(
|
||||||
|
'The default font size')),
|
||||||
|
OptionRecommendation(name='pdf_mono_font_size',
|
||||||
|
recommended_value=16, help=_(
|
||||||
|
'The default font size for monospaced text')),
|
||||||
|
OptionRecommendation(name='pdf_hyphenate', recommended_value=False,
|
||||||
|
help=_('Break long words at the end of lines. This can give the text at the right margin a more even appearance.')),
|
||||||
|
OptionRecommendation(name='pdf_mark_links', recommended_value=False,
|
||||||
|
help=_('Surround all links with a red box, useful for debugging.')),
|
||||||
|
OptionRecommendation(name='pdf_page_numbers', recommended_value=False,
|
||||||
|
help=_('Add page numbers to the bottom of every page in the generated PDF file. If you '
|
||||||
|
'specify a footer template, it will take precedence '
|
||||||
|
'over this option.')),
|
||||||
|
OptionRecommendation(name='pdf_footer_template', recommended_value=None,
|
||||||
|
help=_('An HTML template used to generate %s on every page.'
|
||||||
|
' The strings _PAGENUM_, _TITLE_, _AUTHOR_ and _SECTION_ will be replaced by their current values.')%_('footers')),
|
||||||
|
OptionRecommendation(name='pdf_header_template', recommended_value=None,
|
||||||
|
help=_('An HTML template used to generate %s on every page.'
|
||||||
|
' The strings _PAGENUM_, _TITLE_, _AUTHOR_ and _SECTION_ will be replaced by their current values.')%_('headers')),
|
||||||
|
OptionRecommendation(name='pdf_add_toc', recommended_value=False,
|
||||||
|
help=_('Add a Table of Contents at the end of the PDF that lists page numbers. '
|
||||||
|
'Useful if you want to print out the PDF. If this PDF is intended for electronic use, use the PDF Outline instead.')),
|
||||||
|
OptionRecommendation(name='toc_title', recommended_value=None,
|
||||||
|
help=_('Title for generated table of contents.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='pdf_page_margin_left', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the left page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common left page margin setting.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='pdf_page_margin_top', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the top page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common top page margin setting, unless set to zero.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='pdf_page_margin_right', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the right page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common right page margin setting, unless set to zero.')
|
||||||
|
),
|
||||||
|
|
||||||
|
OptionRecommendation(name='pdf_page_margin_bottom', recommended_value=72.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('The size of the bottom page margin, in pts. Default is 72pt.'
|
||||||
|
' Overrides the common bottom page margin setting, unless set to zero.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='pdf_use_document_margins', recommended_value=False,
|
||||||
|
help=_('Use the page margins specified in the input document via @page CSS rules.'
|
||||||
|
' This will cause the margins specified in the conversion settings to be ignored.'
|
||||||
|
' If the document does not specify page margins, the conversion settings will be used as a fallback.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='pdf_page_number_map', recommended_value=None,
|
||||||
|
help=_('Adjust page numbers, as needed. Syntax is a JavaScript expression for the page number.'
|
||||||
|
' For example, "if (n < 3) 0; else n - 3;", where n is current page number.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='uncompressed_pdf',
|
||||||
|
recommended_value=False, help=_(
|
||||||
|
'Generate an uncompressed PDF, useful for debugging.')
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='pdf_odd_even_offset', recommended_value=0.0,
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_(
|
||||||
|
'Shift the text horizontally by the specified offset (in pts).'
|
||||||
|
' On odd numbered pages, it is shifted to the right and on even'
|
||||||
|
' numbered pages to the left. Use negative numbers for the opposite'
|
||||||
|
' effect. Note that this setting is ignored on pages where the margins'
|
||||||
|
' are smaller than the specified offset. Shifting is done by setting'
|
||||||
|
' the PDF CropBox, not all software respects the CropBox.'
|
||||||
|
)
|
||||||
|
),
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
def specialize_options(self, log, opts, input_fmt):
|
||||||
|
# Ensure Qt is setup to be used with WebEngine
|
||||||
|
# specialize_options is called early enough in the pipeline
|
||||||
|
# that hopefully no Qt application has been constructed as yet
|
||||||
|
from PyQt5.QtWebEngineCore import QWebEngineUrlScheme
|
||||||
|
from PyQt5.QtWebEngineWidgets import QWebEnginePage # noqa
|
||||||
|
from calibre.gui2 import must_use_qt
|
||||||
|
from calibre.constants import FAKE_PROTOCOL
|
||||||
|
scheme = QWebEngineUrlScheme(FAKE_PROTOCOL.encode('ascii'))
|
||||||
|
scheme.setSyntax(QWebEngineUrlScheme.Syntax.Host)
|
||||||
|
scheme.setFlags(QWebEngineUrlScheme.SecureScheme)
|
||||||
|
QWebEngineUrlScheme.registerScheme(scheme)
|
||||||
|
must_use_qt()
|
||||||
|
self.input_fmt = input_fmt
|
||||||
|
|
||||||
|
if opts.pdf_use_document_margins:
|
||||||
|
# Prevent the conversion pipeline from overwriting document margins
|
||||||
|
opts.margin_left = opts.margin_right = opts.margin_top = opts.margin_bottom = -1
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
self.stored_page_margins = getattr(opts, '_stored_page_margins', {})
|
||||||
|
|
||||||
|
self.oeb = oeb_book
|
||||||
|
self.input_plugin, self.opts, self.log = input_plugin, opts, log
|
||||||
|
self.output_path = output_path
|
||||||
|
from calibre.ebooks.oeb.base import OPF, OPF2_NS
|
||||||
|
from lxml import etree
|
||||||
|
from io import BytesIO
|
||||||
|
package = etree.Element(OPF('package'),
|
||||||
|
attrib={'version': '2.0', 'unique-identifier': 'dummy'},
|
||||||
|
nsmap={None: OPF2_NS})
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPF
|
||||||
|
self.oeb.metadata.to_opf2(package)
|
||||||
|
self.metadata = OPF(BytesIO(etree.tostring(package))).to_book_metadata()
|
||||||
|
self.cover_data = None
|
||||||
|
|
||||||
|
if input_plugin.is_image_collection:
|
||||||
|
log.debug('Converting input as an image collection...')
|
||||||
|
self.convert_images(input_plugin.get_images())
|
||||||
|
else:
|
||||||
|
log.debug('Converting input as a text based book...')
|
||||||
|
self.convert_text(oeb_book)
|
||||||
|
|
||||||
|
def convert_images(self, images):
|
||||||
|
from calibre.ebooks.pdf.image_writer import convert
|
||||||
|
convert(images, self.output_path, self.opts, self.metadata, self.report_progress)
|
||||||
|
|
||||||
|
def get_cover_data(self):
|
||||||
|
oeb = self.oeb
|
||||||
|
if (oeb.metadata.cover and unicode_type(oeb.metadata.cover[0]) in oeb.manifest.ids):
|
||||||
|
cover_id = unicode_type(oeb.metadata.cover[0])
|
||||||
|
item = oeb.manifest.ids[cover_id]
|
||||||
|
self.cover_data = item.data
|
||||||
|
|
||||||
|
def process_fonts(self):
|
||||||
|
''' Make sure all fonts are embeddable '''
|
||||||
|
from calibre.ebooks.oeb.base import urlnormalize
|
||||||
|
from calibre.utils.fonts.utils import remove_embed_restriction
|
||||||
|
|
||||||
|
processed = set()
|
||||||
|
for item in list(self.oeb.manifest):
|
||||||
|
if not hasattr(item.data, 'cssRules'):
|
||||||
|
continue
|
||||||
|
for i, rule in enumerate(item.data.cssRules):
|
||||||
|
if rule.type == rule.FONT_FACE_RULE:
|
||||||
|
try:
|
||||||
|
s = rule.style
|
||||||
|
src = s.getProperty('src').propertyValue[0].uri
|
||||||
|
except:
|
||||||
|
continue
|
||||||
|
path = item.abshref(src)
|
||||||
|
ff = self.oeb.manifest.hrefs.get(urlnormalize(path), None)
|
||||||
|
if ff is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
raw = nraw = ff.data
|
||||||
|
if path not in processed:
|
||||||
|
processed.add(path)
|
||||||
|
try:
|
||||||
|
nraw = remove_embed_restriction(raw)
|
||||||
|
except:
|
||||||
|
continue
|
||||||
|
if nraw != raw:
|
||||||
|
ff.data = nraw
|
||||||
|
self.oeb.container.write(path, nraw)
|
||||||
|
|
||||||
|
def convert_text(self, oeb_book):
|
||||||
|
import json
|
||||||
|
from calibre.ebooks.pdf.html_writer import convert
|
||||||
|
self.get_cover_data()
|
||||||
|
self.process_fonts()
|
||||||
|
|
||||||
|
if self.opts.pdf_use_document_margins and self.stored_page_margins:
|
||||||
|
for href, margins in iteritems(self.stored_page_margins):
|
||||||
|
item = oeb_book.manifest.hrefs.get(href)
|
||||||
|
if item is not None:
|
||||||
|
root = item.data
|
||||||
|
if hasattr(root, 'xpath') and margins:
|
||||||
|
root.set('data-calibre-pdf-output-page-margins', json.dumps(margins))
|
||||||
|
|
||||||
|
with TemporaryDirectory('_pdf_out') as oeb_dir:
|
||||||
|
from calibre.customize.ui import plugin_for_output_format
|
||||||
|
oeb_dir = os.path.realpath(oeb_dir)
|
||||||
|
oeb_output = plugin_for_output_format('oeb')
|
||||||
|
oeb_output.convert(oeb_book, oeb_dir, self.input_plugin, self.opts, self.log)
|
||||||
|
opfpath = glob.glob(os.path.join(oeb_dir, '*.opf'))[0]
|
||||||
|
convert(
|
||||||
|
opfpath, self.opts, metadata=self.metadata, output_path=self.output_path,
|
||||||
|
log=self.log, cover_data=self.cover_data, report_progress=self.report_progress
|
||||||
|
)
|
||||||
165
ebook_converter/ebooks/conversion/plugins/pml_input.py
Normal file
165
ebook_converter/ebooks/conversion/plugins/pml_input.py
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import glob
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class PMLInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'PML Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert PML to OEB'
|
||||||
|
# pmlz is a zip file containing pml files and png images.
|
||||||
|
file_types = {'pml', 'pmlz'}
|
||||||
|
commit_name = 'pml_input'
|
||||||
|
|
||||||
|
def process_pml(self, pml_path, html_path, close_all=False):
|
||||||
|
from calibre.ebooks.pml.pmlconverter import PML_HTMLizer
|
||||||
|
|
||||||
|
pclose = False
|
||||||
|
hclose = False
|
||||||
|
|
||||||
|
if not hasattr(pml_path, 'read'):
|
||||||
|
pml_stream = lopen(pml_path, 'rb')
|
||||||
|
pclose = True
|
||||||
|
else:
|
||||||
|
pml_stream = pml_path
|
||||||
|
pml_stream.seek(0)
|
||||||
|
|
||||||
|
if not hasattr(html_path, 'write'):
|
||||||
|
html_stream = lopen(html_path, 'wb')
|
||||||
|
hclose = True
|
||||||
|
else:
|
||||||
|
html_stream = html_path
|
||||||
|
|
||||||
|
ienc = getattr(pml_stream, 'encoding', None)
|
||||||
|
if ienc is None:
|
||||||
|
ienc = 'cp1252'
|
||||||
|
if self.options.input_encoding:
|
||||||
|
ienc = self.options.input_encoding
|
||||||
|
|
||||||
|
self.log.debug('Converting PML to HTML...')
|
||||||
|
hizer = PML_HTMLizer()
|
||||||
|
html = hizer.parse_pml(pml_stream.read().decode(ienc), html_path)
|
||||||
|
html = '<html><head><title></title></head><body>%s</body></html>'%html
|
||||||
|
html_stream.write(html.encode('utf-8', 'replace'))
|
||||||
|
|
||||||
|
if pclose:
|
||||||
|
pml_stream.close()
|
||||||
|
if hclose:
|
||||||
|
html_stream.close()
|
||||||
|
|
||||||
|
return hizer.get_toc()
|
||||||
|
|
||||||
|
def get_images(self, stream, tdir, top_level=False):
|
||||||
|
images = []
|
||||||
|
imgs = []
|
||||||
|
|
||||||
|
if top_level:
|
||||||
|
imgs = glob.glob(os.path.join(tdir, '*.png'))
|
||||||
|
# Images not in top level try bookname_img directory because
|
||||||
|
# that's where Dropbook likes to see them.
|
||||||
|
if not imgs:
|
||||||
|
if hasattr(stream, 'name'):
|
||||||
|
imgs = glob.glob(os.path.join(tdir, os.path.splitext(os.path.basename(stream.name))[0] + '_img', '*.png'))
|
||||||
|
# No images in Dropbook location try generic images directory
|
||||||
|
if not imgs:
|
||||||
|
imgs = glob.glob(os.path.join(os.path.join(tdir, 'images'), '*.png'))
|
||||||
|
if imgs:
|
||||||
|
os.makedirs(os.path.join(getcwd(), 'images'))
|
||||||
|
for img in imgs:
|
||||||
|
pimg_name = os.path.basename(img)
|
||||||
|
pimg_path = os.path.join(getcwd(), 'images', pimg_name)
|
||||||
|
|
||||||
|
images.append('images/' + pimg_name)
|
||||||
|
|
||||||
|
shutil.copy(img, pimg_path)
|
||||||
|
|
||||||
|
return images
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.metadata.toc import TOC
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
|
||||||
|
self.options = options
|
||||||
|
self.log = log
|
||||||
|
pages, images = [], []
|
||||||
|
toc = TOC()
|
||||||
|
|
||||||
|
if file_ext == 'pmlz':
|
||||||
|
log.debug('De-compressing content to temporary directory...')
|
||||||
|
with TemporaryDirectory('_unpmlz') as tdir:
|
||||||
|
zf = ZipFile(stream)
|
||||||
|
zf.extractall(tdir)
|
||||||
|
|
||||||
|
pmls = glob.glob(os.path.join(tdir, '*.pml'))
|
||||||
|
for pml in pmls:
|
||||||
|
html_name = os.path.splitext(os.path.basename(pml))[0]+'.html'
|
||||||
|
html_path = os.path.join(getcwd(), html_name)
|
||||||
|
|
||||||
|
pages.append(html_name)
|
||||||
|
log.debug('Processing PML item %s...' % pml)
|
||||||
|
ttoc = self.process_pml(pml, html_path)
|
||||||
|
toc += ttoc
|
||||||
|
images = self.get_images(stream, tdir, True)
|
||||||
|
else:
|
||||||
|
toc = self.process_pml(stream, 'index.html')
|
||||||
|
pages.append('index.html')
|
||||||
|
|
||||||
|
if hasattr(stream, 'name'):
|
||||||
|
images = self.get_images(stream, os.path.abspath(os.path.dirname(stream.name)))
|
||||||
|
|
||||||
|
# We want pages to be orded alphabetically.
|
||||||
|
pages.sort()
|
||||||
|
|
||||||
|
manifest_items = []
|
||||||
|
for item in pages+images:
|
||||||
|
manifest_items.append((item, None))
|
||||||
|
|
||||||
|
from calibre.ebooks.metadata.meta import get_metadata
|
||||||
|
log.debug('Reading metadata from input file...')
|
||||||
|
mi = get_metadata(stream, 'pml')
|
||||||
|
if 'images/cover.png' in images:
|
||||||
|
mi.cover = 'images/cover.png'
|
||||||
|
opf = OPFCreator(getcwd(), mi)
|
||||||
|
log.debug('Generating manifest...')
|
||||||
|
opf.create_manifest(manifest_items)
|
||||||
|
opf.create_spine(pages)
|
||||||
|
opf.set_toc(toc)
|
||||||
|
with lopen('metadata.opf', 'wb') as opffile:
|
||||||
|
with lopen('toc.ncx', 'wb') as tocfile:
|
||||||
|
opf.render(opffile, tocfile, 'toc.ncx')
|
||||||
|
|
||||||
|
return os.path.join(getcwd(), 'metadata.opf')
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
from calibre.ebooks.oeb.base import XHTML, barename
|
||||||
|
for item in oeb.spine:
|
||||||
|
if hasattr(item.data, 'xpath'):
|
||||||
|
for heading in item.data.iterdescendants(*map(XHTML, 'h1 h2 h3 h4 h5 h6'.split())):
|
||||||
|
if not len(heading):
|
||||||
|
continue
|
||||||
|
span = heading[0]
|
||||||
|
if not heading.text and not span.text and not len(span) and barename(span.tag) == 'span':
|
||||||
|
if not heading.get('id') and span.get('id'):
|
||||||
|
heading.set('id', span.get('id'))
|
||||||
|
heading.text = span.tail
|
||||||
|
heading.remove(span)
|
||||||
|
if len(heading) == 1 and heading[0].get('style') == 'text-align: center; margin: auto;':
|
||||||
|
div = heading[0]
|
||||||
|
if barename(div.tag) == 'div' and not len(div) and not div.get('id') and not heading.get('style'):
|
||||||
|
heading.text = (heading.text or '') + (div.text or '') + (div.tail or '')
|
||||||
|
heading.remove(div)
|
||||||
|
heading.set('style', 'text-align: center')
|
||||||
77
ebook_converter/ebooks/conversion/plugins/pml_output.py
Normal file
77
ebook_converter/ebooks/conversion/plugins/pml_output.py
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os, io
|
||||||
|
|
||||||
|
from calibre.customize.conversion import (OutputFormatPlugin,
|
||||||
|
OptionRecommendation)
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class PMLOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'PML Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'pmlz'
|
||||||
|
commit_name = 'pml_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='pml_output_encoding', recommended_value='cp1252',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify the character encoding of the output document. '
|
||||||
|
'The default is cp1252.')),
|
||||||
|
OptionRecommendation(name='inline_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Add Table of Contents to beginning of the book.')),
|
||||||
|
OptionRecommendation(name='full_image_depth',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Do not reduce the size or bit depth of images. Images '
|
||||||
|
'have their size and depth reduced by default to accommodate '
|
||||||
|
'applications that can not convert images on their '
|
||||||
|
'own such as Dropbook.')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.pml.pmlml import PMLMLizer
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
|
||||||
|
with TemporaryDirectory('_pmlz_output') as tdir:
|
||||||
|
pmlmlizer = PMLMLizer(log)
|
||||||
|
pml = unicode_type(pmlmlizer.extract_content(oeb_book, opts))
|
||||||
|
with lopen(os.path.join(tdir, 'index.pml'), 'wb') as out:
|
||||||
|
out.write(pml.encode(opts.pml_output_encoding, 'replace'))
|
||||||
|
|
||||||
|
img_path = os.path.join(tdir, 'index_img')
|
||||||
|
if not os.path.exists(img_path):
|
||||||
|
os.makedirs(img_path)
|
||||||
|
self.write_images(oeb_book.manifest, pmlmlizer.image_hrefs, img_path, opts)
|
||||||
|
|
||||||
|
log.debug('Compressing output...')
|
||||||
|
pmlz = ZipFile(output_path, 'w')
|
||||||
|
pmlz.add_dir(tdir)
|
||||||
|
|
||||||
|
def write_images(self, manifest, image_hrefs, out_dir, opts):
|
||||||
|
from PIL import Image
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.base import OEB_RASTER_IMAGES
|
||||||
|
for item in manifest:
|
||||||
|
if item.media_type in OEB_RASTER_IMAGES and item.href in image_hrefs.keys():
|
||||||
|
if opts.full_image_depth:
|
||||||
|
im = Image.open(io.BytesIO(item.data))
|
||||||
|
else:
|
||||||
|
im = Image.open(io.BytesIO(item.data)).convert('P')
|
||||||
|
im.thumbnail((300,300), Image.ANTIALIAS)
|
||||||
|
|
||||||
|
data = io.BytesIO()
|
||||||
|
im.save(data, 'PNG')
|
||||||
|
data = data.getvalue()
|
||||||
|
|
||||||
|
path = os.path.join(out_dir, image_hrefs[item.href])
|
||||||
|
|
||||||
|
with lopen(path, 'wb') as out:
|
||||||
|
out.write(data)
|
||||||
28
ebook_converter/ebooks/conversion/plugins/rb_input.py
Normal file
28
ebook_converter/ebooks/conversion/plugins/rb_input.py
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
|
||||||
|
class RBInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'RB Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert RB files to HTML'
|
||||||
|
file_types = {'rb'}
|
||||||
|
commit_name = 'rb_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.rb.reader import Reader
|
||||||
|
|
||||||
|
reader = Reader(stream, log, options.input_encoding)
|
||||||
|
opf = reader.extract_content(getcwd())
|
||||||
|
|
||||||
|
return opf
|
||||||
45
ebook_converter/ebooks/conversion/plugins/rb_output.py
Normal file
45
ebook_converter/ebooks/conversion/plugins/rb_output.py
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, OptionRecommendation
|
||||||
|
|
||||||
|
|
||||||
|
class RBOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'RB Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'rb'
|
||||||
|
commit_name = 'rb_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='inline_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Add Table of Contents to beginning of the book.'))}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.rb.writer import RBWriter
|
||||||
|
|
||||||
|
close = False
|
||||||
|
if not hasattr(output_path, 'write'):
|
||||||
|
close = True
|
||||||
|
if not os.path.exists(os.path.dirname(output_path)) and os.path.dirname(output_path):
|
||||||
|
os.makedirs(os.path.dirname(output_path))
|
||||||
|
out_stream = lopen(output_path, 'wb')
|
||||||
|
else:
|
||||||
|
out_stream = output_path
|
||||||
|
|
||||||
|
writer = RBWriter(opts, log)
|
||||||
|
|
||||||
|
out_stream.seek(0)
|
||||||
|
out_stream.truncate()
|
||||||
|
|
||||||
|
writer.write_content(oeb_book, out_stream, oeb_book.metadata)
|
||||||
|
|
||||||
|
if close:
|
||||||
|
out_stream.close()
|
||||||
169
ebook_converter/ebooks/conversion/plugins/recipe_input.py
Normal file
169
ebook_converter/ebooks/conversion/plugins/recipe_input.py
Normal file
@@ -0,0 +1,169 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from calibre.constants import numeric_version
|
||||||
|
from calibre import walk
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class RecipeDisabled(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class RecipeInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'Recipe Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = _('Download periodical content from the internet')
|
||||||
|
file_types = {'recipe', 'downloaded_recipe'}
|
||||||
|
commit_name = 'recipe_input'
|
||||||
|
|
||||||
|
recommendations = {
|
||||||
|
('chapter', None, OptionRecommendation.HIGH),
|
||||||
|
('dont_split_on_page_breaks', True, OptionRecommendation.HIGH),
|
||||||
|
('use_auto_toc', False, OptionRecommendation.HIGH),
|
||||||
|
('input_encoding', None, OptionRecommendation.HIGH),
|
||||||
|
('input_profile', 'default', OptionRecommendation.HIGH),
|
||||||
|
('page_breaks_before', None, OptionRecommendation.HIGH),
|
||||||
|
('insert_metadata', False, OptionRecommendation.HIGH),
|
||||||
|
}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='test', recommended_value=False,
|
||||||
|
help=_(
|
||||||
|
'Useful for recipe development. Forces'
|
||||||
|
' max_articles_per_feed to 2 and downloads at most 2 feeds.'
|
||||||
|
' You can change the number of feeds and articles by supplying optional arguments.'
|
||||||
|
' For example: --test 3 1 will download at most 3 feeds and only 1 article per feed.')),
|
||||||
|
OptionRecommendation(name='username', recommended_value=None,
|
||||||
|
help=_('Username for sites that require a login to access '
|
||||||
|
'content.')),
|
||||||
|
OptionRecommendation(name='password', recommended_value=None,
|
||||||
|
help=_('Password for sites that require a login to access '
|
||||||
|
'content.')),
|
||||||
|
OptionRecommendation(name='dont_download_recipe',
|
||||||
|
recommended_value=False,
|
||||||
|
help=_('Do not download latest version of builtin recipes from the calibre server')),
|
||||||
|
OptionRecommendation(name='lrf', recommended_value=False,
|
||||||
|
help='Optimize fetching for subsequent conversion to LRF.'),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, recipe_or_file, opts, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.web.feeds.recipes import compile_recipe
|
||||||
|
opts.output_profile.flow_size = 0
|
||||||
|
if file_ext == 'downloaded_recipe':
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
zf = ZipFile(recipe_or_file, 'r')
|
||||||
|
zf.extractall()
|
||||||
|
zf.close()
|
||||||
|
with lopen('download.recipe', 'rb') as f:
|
||||||
|
self.recipe_source = f.read()
|
||||||
|
recipe = compile_recipe(self.recipe_source)
|
||||||
|
recipe.needs_subscription = False
|
||||||
|
self.recipe_object = recipe(opts, log, self.report_progress)
|
||||||
|
else:
|
||||||
|
if os.environ.get('CALIBRE_RECIPE_URN'):
|
||||||
|
from calibre.web.feeds.recipes.collection import get_custom_recipe, get_builtin_recipe_by_id
|
||||||
|
urn = os.environ['CALIBRE_RECIPE_URN']
|
||||||
|
log('Downloading recipe urn: ' + urn)
|
||||||
|
rtype, recipe_id = urn.partition(':')[::2]
|
||||||
|
if not recipe_id:
|
||||||
|
raise ValueError('Invalid recipe urn: ' + urn)
|
||||||
|
if rtype == 'custom':
|
||||||
|
self.recipe_source = get_custom_recipe(recipe_id)
|
||||||
|
else:
|
||||||
|
self.recipe_source = get_builtin_recipe_by_id(urn, log=log, download_recipe=True)
|
||||||
|
if not self.recipe_source:
|
||||||
|
raise ValueError('Could not find recipe with urn: ' + urn)
|
||||||
|
if not isinstance(self.recipe_source, bytes):
|
||||||
|
self.recipe_source = self.recipe_source.encode('utf-8')
|
||||||
|
recipe = compile_recipe(self.recipe_source)
|
||||||
|
elif os.access(recipe_or_file, os.R_OK):
|
||||||
|
with lopen(recipe_or_file, 'rb') as f:
|
||||||
|
self.recipe_source = f.read()
|
||||||
|
recipe = compile_recipe(self.recipe_source)
|
||||||
|
log('Using custom recipe')
|
||||||
|
else:
|
||||||
|
from calibre.web.feeds.recipes.collection import (
|
||||||
|
get_builtin_recipe_by_title, get_builtin_recipe_titles)
|
||||||
|
title = getattr(opts, 'original_recipe_input_arg', recipe_or_file)
|
||||||
|
title = os.path.basename(title).rpartition('.')[0]
|
||||||
|
titles = frozenset(get_builtin_recipe_titles())
|
||||||
|
if title not in titles:
|
||||||
|
title = getattr(opts, 'original_recipe_input_arg', recipe_or_file)
|
||||||
|
title = title.rpartition('.')[0]
|
||||||
|
|
||||||
|
raw = get_builtin_recipe_by_title(title, log=log,
|
||||||
|
download_recipe=not opts.dont_download_recipe)
|
||||||
|
builtin = False
|
||||||
|
try:
|
||||||
|
recipe = compile_recipe(raw)
|
||||||
|
self.recipe_source = raw
|
||||||
|
if recipe.requires_version > numeric_version:
|
||||||
|
log.warn(
|
||||||
|
'Downloaded recipe needs calibre version at least: %s' %
|
||||||
|
('.'.join(recipe.requires_version)))
|
||||||
|
builtin = True
|
||||||
|
except:
|
||||||
|
log.exception('Failed to compile downloaded recipe. Falling '
|
||||||
|
'back to builtin one')
|
||||||
|
builtin = True
|
||||||
|
if builtin:
|
||||||
|
log('Using bundled builtin recipe')
|
||||||
|
raw = get_builtin_recipe_by_title(title, log=log,
|
||||||
|
download_recipe=False)
|
||||||
|
if raw is None:
|
||||||
|
raise ValueError('Failed to find builtin recipe: '+title)
|
||||||
|
recipe = compile_recipe(raw)
|
||||||
|
self.recipe_source = raw
|
||||||
|
else:
|
||||||
|
log('Using downloaded builtin recipe')
|
||||||
|
|
||||||
|
if recipe is None:
|
||||||
|
raise ValueError('%r is not a valid recipe file or builtin recipe' %
|
||||||
|
recipe_or_file)
|
||||||
|
|
||||||
|
disabled = getattr(recipe, 'recipe_disabled', None)
|
||||||
|
if disabled is not None:
|
||||||
|
raise RecipeDisabled(disabled)
|
||||||
|
ro = recipe(opts, log, self.report_progress)
|
||||||
|
ro.download()
|
||||||
|
self.recipe_object = ro
|
||||||
|
|
||||||
|
for key, val in self.recipe_object.conversion_options.items():
|
||||||
|
setattr(opts, key, val)
|
||||||
|
|
||||||
|
for f in os.listdir('.'):
|
||||||
|
if f.endswith('.opf'):
|
||||||
|
return os.path.abspath(f)
|
||||||
|
|
||||||
|
for f in walk('.'):
|
||||||
|
if f.endswith('.opf'):
|
||||||
|
return os.path.abspath(f)
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
if self.recipe_object is not None:
|
||||||
|
self.recipe_object.internal_postprocess_book(oeb, opts, log)
|
||||||
|
self.recipe_object.postprocess_book(oeb, opts, log)
|
||||||
|
|
||||||
|
def specialize(self, oeb, opts, log, output_fmt):
|
||||||
|
if opts.no_inline_navbars:
|
||||||
|
from calibre.ebooks.oeb.base import XPath
|
||||||
|
for item in oeb.spine:
|
||||||
|
for div in XPath('//h:div[contains(@class, "calibre_navbar")]')(item.data):
|
||||||
|
div.getparent().remove(div)
|
||||||
|
|
||||||
|
def save_download(self, zf):
|
||||||
|
raw = self.recipe_source
|
||||||
|
if isinstance(raw, unicode_type):
|
||||||
|
raw = raw.encode('utf-8')
|
||||||
|
zf.writestr('download.recipe', raw)
|
||||||
323
ebook_converter/ebooks/conversion/plugins/rtf_input.py
Normal file
323
ebook_converter/ebooks/conversion/plugins/rtf_input.py
Normal file
@@ -0,0 +1,323 @@
|
|||||||
|
from __future__ import with_statement, unicode_literals
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os, glob, re, textwrap
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from polyglot.builtins import iteritems, filter, getcwd, as_bytes
|
||||||
|
|
||||||
|
border_style_map = {
|
||||||
|
'single' : 'solid',
|
||||||
|
'double-thickness-border' : 'double',
|
||||||
|
'shadowed-border': 'outset',
|
||||||
|
'double-border': 'double',
|
||||||
|
'dotted-border': 'dotted',
|
||||||
|
'dashed': 'dashed',
|
||||||
|
'hairline': 'solid',
|
||||||
|
'inset': 'inset',
|
||||||
|
'dash-small': 'dashed',
|
||||||
|
'dot-dash': 'dotted',
|
||||||
|
'dot-dot-dash': 'dotted',
|
||||||
|
'outset': 'outset',
|
||||||
|
'tripple': 'double',
|
||||||
|
'triple': 'double',
|
||||||
|
'thick-thin-small': 'solid',
|
||||||
|
'thin-thick-small': 'solid',
|
||||||
|
'thin-thick-thin-small': 'solid',
|
||||||
|
'thick-thin-medium': 'solid',
|
||||||
|
'thin-thick-medium': 'solid',
|
||||||
|
'thin-thick-thin-medium': 'solid',
|
||||||
|
'thick-thin-large': 'solid',
|
||||||
|
'thin-thick-thin-large': 'solid',
|
||||||
|
'wavy': 'ridge',
|
||||||
|
'double-wavy': 'ridge',
|
||||||
|
'striped': 'ridge',
|
||||||
|
'emboss': 'inset',
|
||||||
|
'engrave': 'inset',
|
||||||
|
'frame': 'ridge',
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class RTFInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'RTF Input'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = 'Convert RTF files to HTML'
|
||||||
|
file_types = {'rtf'}
|
||||||
|
commit_name = 'rtf_input'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='ignore_wmf', recommended_value=False,
|
||||||
|
help=_('Ignore WMF images instead of replacing them with a placeholder image.')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def generate_xml(self, stream):
|
||||||
|
from calibre.ebooks.rtf2xml.ParseRtf import ParseRtf
|
||||||
|
ofile = u'dataxml.xml'
|
||||||
|
run_lev, debug_dir, indent_out = 1, None, 0
|
||||||
|
if getattr(self.opts, 'debug_pipeline', None) is not None:
|
||||||
|
try:
|
||||||
|
os.mkdir(u'rtfdebug')
|
||||||
|
debug_dir = u'rtfdebug'
|
||||||
|
run_lev = 4
|
||||||
|
indent_out = 1
|
||||||
|
self.log('Running RTFParser in debug mode')
|
||||||
|
except:
|
||||||
|
self.log.warn('Impossible to run RTFParser in debug mode')
|
||||||
|
parser = ParseRtf(
|
||||||
|
in_file=stream,
|
||||||
|
out_file=ofile,
|
||||||
|
# Convert symbol fonts to unicode equivalents. Default
|
||||||
|
# is 1
|
||||||
|
convert_symbol=1,
|
||||||
|
|
||||||
|
# Convert Zapf fonts to unicode equivalents. Default
|
||||||
|
# is 1.
|
||||||
|
convert_zapf=1,
|
||||||
|
|
||||||
|
# Convert Wingding fonts to unicode equivalents.
|
||||||
|
# Default is 1.
|
||||||
|
convert_wingdings=1,
|
||||||
|
|
||||||
|
# Convert RTF caps to real caps.
|
||||||
|
# Default is 1.
|
||||||
|
convert_caps=1,
|
||||||
|
|
||||||
|
# Indent resulting XML.
|
||||||
|
# Default is 0 (no indent).
|
||||||
|
indent=indent_out,
|
||||||
|
|
||||||
|
# Form lists from RTF. Default is 1.
|
||||||
|
form_lists=1,
|
||||||
|
|
||||||
|
# Convert headings to sections. Default is 0.
|
||||||
|
headings_to_sections=1,
|
||||||
|
|
||||||
|
# Group paragraphs with the same style name. Default is 1.
|
||||||
|
group_styles=1,
|
||||||
|
|
||||||
|
# Group borders. Default is 1.
|
||||||
|
group_borders=1,
|
||||||
|
|
||||||
|
# Write or do not write paragraphs. Default is 0.
|
||||||
|
empty_paragraphs=1,
|
||||||
|
|
||||||
|
# Debug
|
||||||
|
deb_dir=debug_dir,
|
||||||
|
|
||||||
|
# Default encoding
|
||||||
|
default_encoding=getattr(self.opts, 'input_encoding', 'cp1252') or 'cp1252',
|
||||||
|
|
||||||
|
# Run level
|
||||||
|
run_level=run_lev,
|
||||||
|
)
|
||||||
|
parser.parse_rtf()
|
||||||
|
with open(ofile, 'rb') as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
def extract_images(self, picts):
|
||||||
|
from calibre.utils.imghdr import what
|
||||||
|
from binascii import unhexlify
|
||||||
|
self.log('Extracting images...')
|
||||||
|
|
||||||
|
with open(picts, 'rb') as f:
|
||||||
|
raw = f.read()
|
||||||
|
picts = filter(len, re.findall(br'\{\\pict([^}]+)\}', raw))
|
||||||
|
hex_pat = re.compile(br'[^a-fA-F0-9]')
|
||||||
|
encs = [hex_pat.sub(b'', pict) for pict in picts]
|
||||||
|
|
||||||
|
count = 0
|
||||||
|
imap = {}
|
||||||
|
for enc in encs:
|
||||||
|
if len(enc) % 2 == 1:
|
||||||
|
enc = enc[:-1]
|
||||||
|
data = unhexlify(enc)
|
||||||
|
fmt = what(None, data)
|
||||||
|
if fmt is None:
|
||||||
|
fmt = 'wmf'
|
||||||
|
count += 1
|
||||||
|
name = u'%04d.%s' % (count, fmt)
|
||||||
|
with open(name, 'wb') as f:
|
||||||
|
f.write(data)
|
||||||
|
imap[count] = name
|
||||||
|
# with open(name+'.hex', 'wb') as f:
|
||||||
|
# f.write(enc)
|
||||||
|
return self.convert_images(imap)
|
||||||
|
|
||||||
|
def convert_images(self, imap):
|
||||||
|
self.default_img = None
|
||||||
|
for count, val in iteritems(imap):
|
||||||
|
try:
|
||||||
|
imap[count] = self.convert_image(val)
|
||||||
|
except:
|
||||||
|
self.log.exception('Failed to convert', val)
|
||||||
|
return imap
|
||||||
|
|
||||||
|
def convert_image(self, name):
|
||||||
|
if not name.endswith('.wmf'):
|
||||||
|
return name
|
||||||
|
try:
|
||||||
|
return self.rasterize_wmf(name)
|
||||||
|
except Exception:
|
||||||
|
self.log.exception('Failed to convert WMF image %r'%name)
|
||||||
|
return self.replace_wmf(name)
|
||||||
|
|
||||||
|
def replace_wmf(self, name):
|
||||||
|
if self.opts.ignore_wmf:
|
||||||
|
os.remove(name)
|
||||||
|
return '__REMOVE_ME__'
|
||||||
|
from calibre.ebooks.covers import message_image
|
||||||
|
if self.default_img is None:
|
||||||
|
self.default_img = message_image('Conversion of WMF images is not supported.'
|
||||||
|
' Use Microsoft Word or OpenOffice to save this RTF file'
|
||||||
|
' as HTML and convert that in calibre.')
|
||||||
|
name = name.replace('.wmf', '.jpg')
|
||||||
|
with lopen(name, 'wb') as f:
|
||||||
|
f.write(self.default_img)
|
||||||
|
return name
|
||||||
|
|
||||||
|
def rasterize_wmf(self, name):
|
||||||
|
from calibre.utils.wmf.parse import wmf_unwrap
|
||||||
|
with open(name, 'rb') as f:
|
||||||
|
data = f.read()
|
||||||
|
data = wmf_unwrap(data)
|
||||||
|
name = name.replace('.wmf', '.png')
|
||||||
|
with open(name, 'wb') as f:
|
||||||
|
f.write(data)
|
||||||
|
return name
|
||||||
|
|
||||||
|
def write_inline_css(self, ic, border_styles):
|
||||||
|
font_size_classes = ['span.fs%d { font-size: %spt }'%(i, x) for i, x in
|
||||||
|
enumerate(ic.font_sizes)]
|
||||||
|
color_classes = ['span.col%d { color: %s }'%(i, x) for i, x in
|
||||||
|
enumerate(ic.colors) if x != 'false']
|
||||||
|
css = textwrap.dedent('''
|
||||||
|
span.none {
|
||||||
|
text-decoration: none; font-weight: normal;
|
||||||
|
font-style: normal; font-variant: normal
|
||||||
|
}
|
||||||
|
|
||||||
|
span.italics { font-style: italic }
|
||||||
|
|
||||||
|
span.bold { font-weight: bold }
|
||||||
|
|
||||||
|
span.small-caps { font-variant: small-caps }
|
||||||
|
|
||||||
|
span.underlined { text-decoration: underline }
|
||||||
|
|
||||||
|
span.strike-through { text-decoration: line-through }
|
||||||
|
|
||||||
|
''')
|
||||||
|
css += '\n'+'\n'.join(font_size_classes)
|
||||||
|
css += '\n' +'\n'.join(color_classes)
|
||||||
|
|
||||||
|
for cls, val in iteritems(border_styles):
|
||||||
|
css += '\n\n.%s {\n%s\n}'%(cls, val)
|
||||||
|
|
||||||
|
with open(u'styles.css', 'ab') as f:
|
||||||
|
f.write(css.encode('utf-8'))
|
||||||
|
|
||||||
|
def convert_borders(self, doc):
|
||||||
|
border_styles = []
|
||||||
|
style_map = {}
|
||||||
|
for elem in doc.xpath(r'//*[local-name()="cell"]'):
|
||||||
|
style = ['border-style: hidden', 'border-width: 1px',
|
||||||
|
'border-color: black']
|
||||||
|
for x in ('bottom', 'top', 'left', 'right'):
|
||||||
|
bs = elem.get('border-cell-%s-style'%x, None)
|
||||||
|
if bs:
|
||||||
|
cbs = border_style_map.get(bs, 'solid')
|
||||||
|
style.append('border-%s-style: %s'%(x, cbs))
|
||||||
|
bw = elem.get('border-cell-%s-line-width'%x, None)
|
||||||
|
if bw:
|
||||||
|
style.append('border-%s-width: %spt'%(x, bw))
|
||||||
|
bc = elem.get('border-cell-%s-color'%x, None)
|
||||||
|
if bc:
|
||||||
|
style.append('border-%s-color: %s'%(x, bc))
|
||||||
|
style = ';\n'.join(style)
|
||||||
|
if style not in border_styles:
|
||||||
|
border_styles.append(style)
|
||||||
|
idx = border_styles.index(style)
|
||||||
|
cls = 'border_style%d'%idx
|
||||||
|
style_map[cls] = style
|
||||||
|
elem.set('class', cls)
|
||||||
|
return style_map
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.ebooks.metadata.meta import get_metadata
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||||
|
from calibre.ebooks.rtf2xml.ParseRtf import RtfInvalidCodeException
|
||||||
|
from calibre.ebooks.rtf.input import InlineClass
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
self.opts = options
|
||||||
|
self.log = log
|
||||||
|
self.log('Converting RTF to XML...')
|
||||||
|
try:
|
||||||
|
xml = self.generate_xml(stream.name)
|
||||||
|
except RtfInvalidCodeException as e:
|
||||||
|
self.log.exception('Unable to parse RTF')
|
||||||
|
raise ValueError(_('This RTF file has a feature calibre does not '
|
||||||
|
'support. Convert it to HTML first and then try it.\n%s')%e)
|
||||||
|
|
||||||
|
d = glob.glob(os.path.join('*_rtf_pict_dir', 'picts.rtf'))
|
||||||
|
if d:
|
||||||
|
imap = {}
|
||||||
|
try:
|
||||||
|
imap = self.extract_images(d[0])
|
||||||
|
except:
|
||||||
|
self.log.exception('Failed to extract images...')
|
||||||
|
|
||||||
|
self.log('Parsing XML...')
|
||||||
|
doc = safe_xml_fromstring(xml)
|
||||||
|
border_styles = self.convert_borders(doc)
|
||||||
|
for pict in doc.xpath('//rtf:pict[@num]',
|
||||||
|
namespaces={'rtf':'http://rtf2xml.sourceforge.net/'}):
|
||||||
|
num = int(pict.get('num'))
|
||||||
|
name = imap.get(num, None)
|
||||||
|
if name is not None:
|
||||||
|
pict.set('num', name)
|
||||||
|
|
||||||
|
self.log('Converting XML to HTML...')
|
||||||
|
inline_class = InlineClass(self.log)
|
||||||
|
styledoc = safe_xml_fromstring(P('templates/rtf.xsl', data=True), recover=False)
|
||||||
|
extensions = {('calibre', 'inline-class') : inline_class}
|
||||||
|
transform = etree.XSLT(styledoc, extensions=extensions)
|
||||||
|
result = transform(doc)
|
||||||
|
html = u'index.xhtml'
|
||||||
|
with open(html, 'wb') as f:
|
||||||
|
res = as_bytes(transform.tostring(result))
|
||||||
|
# res = res[:100].replace('xmlns:html', 'xmlns') + res[100:]
|
||||||
|
# clean multiple \n
|
||||||
|
res = re.sub(b'\n+', b'\n', res)
|
||||||
|
# Replace newlines inserted by the 'empty_paragraphs' option in rtf2xml with html blank lines
|
||||||
|
# res = re.sub('\s*<body>', '<body>', res)
|
||||||
|
# res = re.sub('(?<=\n)\n{2}',
|
||||||
|
# u'<p>\u00a0</p>\n'.encode('utf-8'), res)
|
||||||
|
f.write(res)
|
||||||
|
self.write_inline_css(inline_class, border_styles)
|
||||||
|
stream.seek(0)
|
||||||
|
mi = get_metadata(stream, 'rtf')
|
||||||
|
if not mi.title:
|
||||||
|
mi.title = _('Unknown')
|
||||||
|
if not mi.authors:
|
||||||
|
mi.authors = [_('Unknown')]
|
||||||
|
opf = OPFCreator(getcwd(), mi)
|
||||||
|
opf.create_manifest([(u'index.xhtml', None)])
|
||||||
|
opf.create_spine([u'index.xhtml'])
|
||||||
|
opf.render(open(u'metadata.opf', 'wb'))
|
||||||
|
return os.path.abspath(u'metadata.opf')
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
for item in oeb.spine:
|
||||||
|
for img in item.data.xpath('//*[local-name()="img" and @src="__REMOVE_ME__"]'):
|
||||||
|
p = img.getparent()
|
||||||
|
idx = p.index(img)
|
||||||
|
p.remove(img)
|
||||||
|
if img.tail:
|
||||||
|
if idx == 0:
|
||||||
|
p.text = (p.text or '') + img.tail
|
||||||
|
else:
|
||||||
|
p[idx-1].tail = (p[idx-1].tail or '') + img.tail
|
||||||
40
ebook_converter/ebooks/conversion/plugins/rtf_output.py
Normal file
40
ebook_converter/ebooks/conversion/plugins/rtf_output.py
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin
|
||||||
|
|
||||||
|
|
||||||
|
class RTFOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'RTF Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'rtf'
|
||||||
|
commit_name = 'rtf_output'
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.rtf.rtfml import RTFMLizer
|
||||||
|
|
||||||
|
rtfmlitzer = RTFMLizer(log)
|
||||||
|
content = rtfmlitzer.extract_content(oeb_book, opts)
|
||||||
|
|
||||||
|
close = False
|
||||||
|
if not hasattr(output_path, 'write'):
|
||||||
|
close = True
|
||||||
|
if not os.path.exists(os.path.dirname(output_path)) and os.path.dirname(output_path) != '':
|
||||||
|
os.makedirs(os.path.dirname(output_path))
|
||||||
|
out_stream = lopen(output_path, 'wb')
|
||||||
|
else:
|
||||||
|
out_stream = output_path
|
||||||
|
|
||||||
|
out_stream.seek(0)
|
||||||
|
out_stream.truncate()
|
||||||
|
out_stream.write(content.encode('ascii', 'replace'))
|
||||||
|
|
||||||
|
if close:
|
||||||
|
out_stream.close()
|
||||||
122
ebook_converter/ebooks/conversion/plugins/snb_input.py
Normal file
122
ebook_converter/ebooks/conversion/plugins/snb_input.py
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2010, Li Fanxi <lifanxi@freemindworld.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from calibre.utils.filenames import ascii_filename
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
HTML_TEMPLATE = '<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/><title>%s</title></head><body>\n%s\n</body></html>'
|
||||||
|
|
||||||
|
|
||||||
|
def html_encode(s):
|
||||||
|
return s.replace('&', '&').replace('<', '<').replace('>', '>').replace('"', '"').replace("'", ''').replace('\n', '<br/>').replace(' ', ' ') # noqa
|
||||||
|
|
||||||
|
|
||||||
|
class SNBInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'SNB Input'
|
||||||
|
author = 'Li Fanxi'
|
||||||
|
description = 'Convert SNB files to OEB'
|
||||||
|
file_types = {'snb'}
|
||||||
|
commit_name = 'snb_input'
|
||||||
|
|
||||||
|
options = set()
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
import uuid
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.base import DirContainer
|
||||||
|
from calibre.ebooks.snb.snbfile import SNBFile
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
|
||||||
|
log.debug("Parsing SNB file...")
|
||||||
|
snbFile = SNBFile()
|
||||||
|
try:
|
||||||
|
snbFile.Parse(stream)
|
||||||
|
except:
|
||||||
|
raise ValueError("Invalid SNB file")
|
||||||
|
if not snbFile.IsValid():
|
||||||
|
log.debug("Invalid SNB file")
|
||||||
|
raise ValueError("Invalid SNB file")
|
||||||
|
log.debug("Handle meta data ...")
|
||||||
|
from calibre.ebooks.conversion.plumber import create_oebbook
|
||||||
|
oeb = create_oebbook(log, None, options,
|
||||||
|
encoding=options.input_encoding, populate=False)
|
||||||
|
meta = snbFile.GetFileStream('snbf/book.snbf')
|
||||||
|
if meta is not None:
|
||||||
|
meta = safe_xml_fromstring(meta)
|
||||||
|
l = {'title' : './/head/name',
|
||||||
|
'creator' : './/head/author',
|
||||||
|
'language' : './/head/language',
|
||||||
|
'generator': './/head/generator',
|
||||||
|
'publisher': './/head/publisher',
|
||||||
|
'cover' : './/head/cover', }
|
||||||
|
d = {}
|
||||||
|
for item in l:
|
||||||
|
node = meta.find(l[item])
|
||||||
|
if node is not None:
|
||||||
|
d[item] = node.text if node.text is not None else ''
|
||||||
|
else:
|
||||||
|
d[item] = ''
|
||||||
|
|
||||||
|
oeb.metadata.add('title', d['title'])
|
||||||
|
oeb.metadata.add('creator', d['creator'], attrib={'role':'aut'})
|
||||||
|
oeb.metadata.add('language', d['language'].lower().replace('_', '-'))
|
||||||
|
oeb.metadata.add('generator', d['generator'])
|
||||||
|
oeb.metadata.add('publisher', d['publisher'])
|
||||||
|
if d['cover'] != '':
|
||||||
|
oeb.guide.add('cover', 'Cover', d['cover'])
|
||||||
|
|
||||||
|
bookid = unicode_type(uuid.uuid4())
|
||||||
|
oeb.metadata.add('identifier', bookid, id='uuid_id', scheme='uuid')
|
||||||
|
for ident in oeb.metadata.identifier:
|
||||||
|
if 'id' in ident.attrib:
|
||||||
|
oeb.uid = oeb.metadata.identifier[0]
|
||||||
|
break
|
||||||
|
|
||||||
|
with TemporaryDirectory('_snb2oeb', keep=True) as tdir:
|
||||||
|
log.debug('Process TOC ...')
|
||||||
|
toc = snbFile.GetFileStream('snbf/toc.snbf')
|
||||||
|
oeb.container = DirContainer(tdir, log)
|
||||||
|
if toc is not None:
|
||||||
|
toc = safe_xml_fromstring(toc)
|
||||||
|
i = 1
|
||||||
|
for ch in toc.find('.//body'):
|
||||||
|
chapterName = ch.text
|
||||||
|
chapterSrc = ch.get('src')
|
||||||
|
fname = 'ch_%d.htm' % i
|
||||||
|
data = snbFile.GetFileStream('snbc/' + chapterSrc)
|
||||||
|
if data is None:
|
||||||
|
continue
|
||||||
|
snbc = safe_xml_fromstring(data)
|
||||||
|
lines = []
|
||||||
|
for line in snbc.find('.//body'):
|
||||||
|
if line.tag == 'text':
|
||||||
|
lines.append('<p>%s</p>' % html_encode(line.text))
|
||||||
|
elif line.tag == 'img':
|
||||||
|
lines.append('<p><img src="%s" /></p>' % html_encode(line.text))
|
||||||
|
with open(os.path.join(tdir, fname), 'wb') as f:
|
||||||
|
f.write((HTML_TEMPLATE % (chapterName, '\n'.join(lines))).encode('utf-8', 'replace'))
|
||||||
|
oeb.toc.add(ch.text, fname)
|
||||||
|
id, href = oeb.manifest.generate(id='html',
|
||||||
|
href=ascii_filename(fname))
|
||||||
|
item = oeb.manifest.add(id, href, 'text/html')
|
||||||
|
item.html_input_href = fname
|
||||||
|
oeb.spine.add(item, True)
|
||||||
|
i = i + 1
|
||||||
|
imageFiles = snbFile.OutputImageFiles(tdir)
|
||||||
|
for f, m in imageFiles:
|
||||||
|
id, href = oeb.manifest.generate(id='image',
|
||||||
|
href=ascii_filename(f))
|
||||||
|
item = oeb.manifest.add(id, href, m)
|
||||||
|
item.html_input_href = f
|
||||||
|
|
||||||
|
return oeb
|
||||||
269
ebook_converter/ebooks/conversion/plugins/snb_output.py
Normal file
269
ebook_converter/ebooks/conversion/plugins/snb_output.py
Normal file
@@ -0,0 +1,269 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2010, Li Fanxi <lifanxi@freemindworld.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, OptionRecommendation
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from calibre.constants import __appname__, __version__
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class SNBOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'SNB Output'
|
||||||
|
author = 'Li Fanxi'
|
||||||
|
file_type = 'snb'
|
||||||
|
commit_name = 'snb_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='snb_output_encoding', recommended_value='utf-8',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify the character encoding of the output document. '
|
||||||
|
'The default is utf-8.')),
|
||||||
|
OptionRecommendation(name='snb_max_line_length',
|
||||||
|
recommended_value=0, level=OptionRecommendation.LOW,
|
||||||
|
help=_('The maximum number of characters per line. This splits on '
|
||||||
|
'the first space before the specified value. If no space is found '
|
||||||
|
'the line will be broken at the space after and will exceed the '
|
||||||
|
'specified value. Also, there is a minimum of 25 characters. '
|
||||||
|
'Use 0 to disable line splitting.')),
|
||||||
|
OptionRecommendation(name='snb_insert_empty_line',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify whether or not to insert an empty line between '
|
||||||
|
'two paragraphs.')),
|
||||||
|
OptionRecommendation(name='snb_dont_indent_first_line',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify whether or not to insert two space characters '
|
||||||
|
'to indent the first line of each paragraph.')),
|
||||||
|
OptionRecommendation(name='snb_hide_chapter_name',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify whether or not to hide the chapter title for each '
|
||||||
|
'chapter. Useful for image-only output (eg. comics).')),
|
||||||
|
OptionRecommendation(name='snb_full_screen',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Resize all the images for full screen view. ')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from lxml import etree
|
||||||
|
from calibre.ebooks.snb.snbfile import SNBFile
|
||||||
|
from calibre.ebooks.snb.snbml import SNBMLizer, ProcessFileName
|
||||||
|
|
||||||
|
self.opts = opts
|
||||||
|
from calibre.ebooks.oeb.transforms.rasterize import SVGRasterizer, Unavailable
|
||||||
|
try:
|
||||||
|
rasterizer = SVGRasterizer()
|
||||||
|
rasterizer(oeb_book, opts)
|
||||||
|
except Unavailable:
|
||||||
|
log.warn('SVG rasterizer unavailable, SVG will not be converted')
|
||||||
|
|
||||||
|
# Create temp dir
|
||||||
|
with TemporaryDirectory('_snb_output') as tdir:
|
||||||
|
# Create stub directories
|
||||||
|
snbfDir = os.path.join(tdir, 'snbf')
|
||||||
|
snbcDir = os.path.join(tdir, 'snbc')
|
||||||
|
snbiDir = os.path.join(tdir, 'snbc/images')
|
||||||
|
os.mkdir(snbfDir)
|
||||||
|
os.mkdir(snbcDir)
|
||||||
|
os.mkdir(snbiDir)
|
||||||
|
|
||||||
|
# Process Meta data
|
||||||
|
meta = oeb_book.metadata
|
||||||
|
if meta.title:
|
||||||
|
title = unicode_type(meta.title[0])
|
||||||
|
else:
|
||||||
|
title = ''
|
||||||
|
authors = [unicode_type(x) for x in meta.creator if x.role == 'aut']
|
||||||
|
if meta.publisher:
|
||||||
|
publishers = unicode_type(meta.publisher[0])
|
||||||
|
else:
|
||||||
|
publishers = ''
|
||||||
|
if meta.language:
|
||||||
|
lang = unicode_type(meta.language[0]).upper()
|
||||||
|
else:
|
||||||
|
lang = ''
|
||||||
|
if meta.description:
|
||||||
|
abstract = unicode_type(meta.description[0])
|
||||||
|
else:
|
||||||
|
abstract = ''
|
||||||
|
|
||||||
|
# Process Cover
|
||||||
|
g, m, s = oeb_book.guide, oeb_book.manifest, oeb_book.spine
|
||||||
|
href = None
|
||||||
|
if 'titlepage' not in g:
|
||||||
|
if 'cover' in g:
|
||||||
|
href = g['cover'].href
|
||||||
|
|
||||||
|
# Output book info file
|
||||||
|
bookInfoTree = etree.Element("book-snbf", version="1.0")
|
||||||
|
headTree = etree.SubElement(bookInfoTree, "head")
|
||||||
|
etree.SubElement(headTree, "name").text = title
|
||||||
|
etree.SubElement(headTree, "author").text = ' '.join(authors)
|
||||||
|
etree.SubElement(headTree, "language").text = lang
|
||||||
|
etree.SubElement(headTree, "rights")
|
||||||
|
etree.SubElement(headTree, "publisher").text = publishers
|
||||||
|
etree.SubElement(headTree, "generator").text = __appname__ + ' ' + __version__
|
||||||
|
etree.SubElement(headTree, "created")
|
||||||
|
etree.SubElement(headTree, "abstract").text = abstract
|
||||||
|
if href is not None:
|
||||||
|
etree.SubElement(headTree, "cover").text = ProcessFileName(href)
|
||||||
|
else:
|
||||||
|
etree.SubElement(headTree, "cover")
|
||||||
|
with open(os.path.join(snbfDir, 'book.snbf'), 'wb') as f:
|
||||||
|
f.write(etree.tostring(bookInfoTree, pretty_print=True, encoding='utf-8'))
|
||||||
|
|
||||||
|
# Output TOC
|
||||||
|
tocInfoTree = etree.Element("toc-snbf")
|
||||||
|
tocHead = etree.SubElement(tocInfoTree, "head")
|
||||||
|
tocBody = etree.SubElement(tocInfoTree, "body")
|
||||||
|
outputFiles = {}
|
||||||
|
if oeb_book.toc.count() == 0:
|
||||||
|
log.warn('This SNB file has no Table of Contents. '
|
||||||
|
'Creating a default TOC')
|
||||||
|
first = next(iter(oeb_book.spine))
|
||||||
|
oeb_book.toc.add(_('Start page'), first.href)
|
||||||
|
else:
|
||||||
|
first = next(iter(oeb_book.spine))
|
||||||
|
if oeb_book.toc[0].href != first.href:
|
||||||
|
# The pages before the fist item in toc will be stored as
|
||||||
|
# "Cover Pages".
|
||||||
|
# oeb_book.toc does not support "insert", so we generate
|
||||||
|
# the tocInfoTree directly instead of modifying the toc
|
||||||
|
ch = etree.SubElement(tocBody, "chapter")
|
||||||
|
ch.set("src", ProcessFileName(first.href) + ".snbc")
|
||||||
|
ch.text = _('Cover pages')
|
||||||
|
outputFiles[first.href] = []
|
||||||
|
outputFiles[first.href].append(("", _("Cover pages")))
|
||||||
|
|
||||||
|
for tocitem in oeb_book.toc:
|
||||||
|
if tocitem.href.find('#') != -1:
|
||||||
|
item = tocitem.href.split('#')
|
||||||
|
if len(item) != 2:
|
||||||
|
log.error('Error in TOC item: %s' % tocitem)
|
||||||
|
else:
|
||||||
|
if item[0] in outputFiles:
|
||||||
|
outputFiles[item[0]].append((item[1], tocitem.title))
|
||||||
|
else:
|
||||||
|
outputFiles[item[0]] = []
|
||||||
|
if "" not in outputFiles[item[0]]:
|
||||||
|
outputFiles[item[0]].append(("", tocitem.title + _(" (Preface)")))
|
||||||
|
ch = etree.SubElement(tocBody, "chapter")
|
||||||
|
ch.set("src", ProcessFileName(item[0]) + ".snbc")
|
||||||
|
ch.text = tocitem.title + _(" (Preface)")
|
||||||
|
outputFiles[item[0]].append((item[1], tocitem.title))
|
||||||
|
else:
|
||||||
|
if tocitem.href in outputFiles:
|
||||||
|
outputFiles[tocitem.href].append(("", tocitem.title))
|
||||||
|
else:
|
||||||
|
outputFiles[tocitem.href] = []
|
||||||
|
outputFiles[tocitem.href].append(("", tocitem.title))
|
||||||
|
ch = etree.SubElement(tocBody, "chapter")
|
||||||
|
ch.set("src", ProcessFileName(tocitem.href) + ".snbc")
|
||||||
|
ch.text = tocitem.title
|
||||||
|
|
||||||
|
etree.SubElement(tocHead, "chapters").text = '%d' % len(tocBody)
|
||||||
|
|
||||||
|
with open(os.path.join(snbfDir, 'toc.snbf'), 'wb') as f:
|
||||||
|
f.write(etree.tostring(tocInfoTree, pretty_print=True, encoding='utf-8'))
|
||||||
|
|
||||||
|
# Output Files
|
||||||
|
oldTree = None
|
||||||
|
mergeLast = False
|
||||||
|
lastName = None
|
||||||
|
for item in s:
|
||||||
|
from calibre.ebooks.oeb.base import OEB_DOCS, OEB_IMAGES
|
||||||
|
if m.hrefs[item.href].media_type in OEB_DOCS:
|
||||||
|
if item.href not in outputFiles:
|
||||||
|
log.debug('File %s is unused in TOC. Continue in last chapter' % item.href)
|
||||||
|
mergeLast = True
|
||||||
|
else:
|
||||||
|
if oldTree is not None and mergeLast:
|
||||||
|
log.debug('Output the modified chapter again: %s' % lastName)
|
||||||
|
with open(os.path.join(snbcDir, lastName), 'wb') as f:
|
||||||
|
f.write(etree.tostring(oldTree, pretty_print=True, encoding='utf-8'))
|
||||||
|
mergeLast = False
|
||||||
|
|
||||||
|
log.debug('Converting %s to snbc...' % item.href)
|
||||||
|
snbwriter = SNBMLizer(log)
|
||||||
|
snbcTrees = None
|
||||||
|
if not mergeLast:
|
||||||
|
snbcTrees = snbwriter.extract_content(oeb_book, item, outputFiles[item.href], opts)
|
||||||
|
for subName in snbcTrees:
|
||||||
|
postfix = ''
|
||||||
|
if subName != '':
|
||||||
|
postfix = '_' + subName
|
||||||
|
lastName = ProcessFileName(item.href + postfix + ".snbc")
|
||||||
|
oldTree = snbcTrees[subName]
|
||||||
|
with open(os.path.join(snbcDir, lastName), 'wb') as f:
|
||||||
|
f.write(etree.tostring(oldTree, pretty_print=True, encoding='utf-8'))
|
||||||
|
else:
|
||||||
|
log.debug('Merge %s with last TOC item...' % item.href)
|
||||||
|
snbwriter.merge_content(oldTree, oeb_book, item, [('', _("Start"))], opts)
|
||||||
|
|
||||||
|
# Output the last one if needed
|
||||||
|
log.debug('Output the last modified chapter again: %s' % lastName)
|
||||||
|
if oldTree is not None and mergeLast:
|
||||||
|
with open(os.path.join(snbcDir, lastName), 'wb') as f:
|
||||||
|
f.write(etree.tostring(oldTree, pretty_print=True, encoding='utf-8'))
|
||||||
|
mergeLast = False
|
||||||
|
|
||||||
|
for item in m:
|
||||||
|
if m.hrefs[item.href].media_type in OEB_IMAGES:
|
||||||
|
log.debug('Converting image: %s ...' % item.href)
|
||||||
|
content = m.hrefs[item.href].data
|
||||||
|
# Convert & Resize image
|
||||||
|
self.HandleImage(content, os.path.join(snbiDir, ProcessFileName(item.href)))
|
||||||
|
|
||||||
|
# Package as SNB File
|
||||||
|
snbFile = SNBFile()
|
||||||
|
snbFile.FromDir(tdir)
|
||||||
|
snbFile.Output(output_path)
|
||||||
|
|
||||||
|
def HandleImage(self, imageData, imagePath):
|
||||||
|
from calibre.utils.img import image_from_data, resize_image, image_to_data
|
||||||
|
img = image_from_data(imageData)
|
||||||
|
x, y = img.width(), img.height()
|
||||||
|
if self.opts:
|
||||||
|
if self.opts.snb_full_screen:
|
||||||
|
SCREEN_X, SCREEN_Y = self.opts.output_profile.screen_size
|
||||||
|
else:
|
||||||
|
SCREEN_X, SCREEN_Y = self.opts.output_profile.comic_screen_size
|
||||||
|
else:
|
||||||
|
SCREEN_X = 540
|
||||||
|
SCREEN_Y = 700
|
||||||
|
# Handle big image only
|
||||||
|
if x > SCREEN_X or y > SCREEN_Y:
|
||||||
|
xScale = float(x) / SCREEN_X
|
||||||
|
yScale = float(y) / SCREEN_Y
|
||||||
|
scale = max(xScale, yScale)
|
||||||
|
# TODO : intelligent image rotation
|
||||||
|
# img = img.rotate(90)
|
||||||
|
# x,y = y,x
|
||||||
|
img = resize_image(img, x // scale, y // scale)
|
||||||
|
with lopen(imagePath, 'wb') as f:
|
||||||
|
f.write(image_to_data(img, fmt=imagePath.rpartition('.')[-1]))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
from calibre.ebooks.oeb.reader import OEBReader
|
||||||
|
from calibre.ebooks.oeb.base import OEBBook
|
||||||
|
from calibre.ebooks.conversion.preprocess import HTMLPreProcessor
|
||||||
|
from calibre.customize.profiles import HanlinV3Output
|
||||||
|
|
||||||
|
class OptionValues(object):
|
||||||
|
pass
|
||||||
|
|
||||||
|
opts = OptionValues()
|
||||||
|
opts.output_profile = HanlinV3Output(None)
|
||||||
|
|
||||||
|
html_preprocessor = HTMLPreProcessor(None, None, opts)
|
||||||
|
from calibre.utils.logging import default_log
|
||||||
|
oeb = OEBBook(default_log, html_preprocessor)
|
||||||
|
reader = OEBReader
|
||||||
|
reader()(oeb, '/tmp/bbb/processed/')
|
||||||
|
SNBOutput(None).convert(oeb, '/tmp/test.snb', None, None, default_log)
|
||||||
39
ebook_converter/ebooks/conversion/plugins/tcr_input.py
Normal file
39
ebook_converter/ebooks/conversion/plugins/tcr_input.py
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
from io import BytesIO
|
||||||
|
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin
|
||||||
|
|
||||||
|
|
||||||
|
class TCRInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'TCR Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert TCR files to HTML'
|
||||||
|
file_types = {'tcr'}
|
||||||
|
commit_name = 'tcr_input'
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log, accelerators):
|
||||||
|
from calibre.ebooks.compression.tcr import decompress
|
||||||
|
|
||||||
|
log.info('Decompressing text...')
|
||||||
|
raw_txt = decompress(stream)
|
||||||
|
|
||||||
|
log.info('Converting text to OEB...')
|
||||||
|
stream = BytesIO(raw_txt)
|
||||||
|
|
||||||
|
from calibre.customize.ui import plugin_for_input_format
|
||||||
|
|
||||||
|
txt_plugin = plugin_for_input_format('txt')
|
||||||
|
for opt in txt_plugin.options:
|
||||||
|
if not hasattr(self.options, opt.option.name):
|
||||||
|
setattr(options, opt.option.name, opt.recommended_value)
|
||||||
|
|
||||||
|
stream.seek(0)
|
||||||
|
return txt_plugin.convert(stream, options,
|
||||||
|
'txt', log, accelerators)
|
||||||
56
ebook_converter/ebooks/conversion/plugins/tcr_output.py
Normal file
56
ebook_converter/ebooks/conversion/plugins/tcr_output.py
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, \
|
||||||
|
OptionRecommendation
|
||||||
|
|
||||||
|
|
||||||
|
class TCROutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'TCR Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'tcr'
|
||||||
|
commit_name = 'tcr_output'
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='tcr_output_encoding', recommended_value='utf-8',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify the character encoding of the output document. '
|
||||||
|
'The default is utf-8.'))}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.txt.txtml import TXTMLizer
|
||||||
|
from calibre.ebooks.compression.tcr import compress
|
||||||
|
|
||||||
|
close = False
|
||||||
|
if not hasattr(output_path, 'write'):
|
||||||
|
close = True
|
||||||
|
if not os.path.exists(os.path.dirname(output_path)) and os.path.dirname(output_path):
|
||||||
|
os.makedirs(os.path.dirname(output_path))
|
||||||
|
out_stream = lopen(output_path, 'wb')
|
||||||
|
else:
|
||||||
|
out_stream = output_path
|
||||||
|
|
||||||
|
setattr(opts, 'flush_paras', False)
|
||||||
|
setattr(opts, 'max_line_length', 0)
|
||||||
|
setattr(opts, 'force_max_line_length', False)
|
||||||
|
setattr(opts, 'indent_paras', False)
|
||||||
|
|
||||||
|
writer = TXTMLizer(log)
|
||||||
|
txt = writer.extract_content(oeb_book, opts).encode(opts.tcr_output_encoding, 'replace')
|
||||||
|
|
||||||
|
log.info('Compressing text...')
|
||||||
|
txt = compress(txt)
|
||||||
|
|
||||||
|
out_stream.seek(0)
|
||||||
|
out_stream.truncate()
|
||||||
|
out_stream.write(txt)
|
||||||
|
|
||||||
|
if close:
|
||||||
|
out_stream.close()
|
||||||
308
ebook_converter/ebooks/conversion/plugins/txt_input.py
Normal file
308
ebook_converter/ebooks/conversion/plugins/txt_input.py
Normal file
@@ -0,0 +1,308 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from calibre import _ent_pat, walk, xml_entity_to_unicode
|
||||||
|
from calibre.customize.conversion import InputFormatPlugin, OptionRecommendation
|
||||||
|
from polyglot.builtins import getcwd
|
||||||
|
|
||||||
|
MD_EXTENSIONS = {
|
||||||
|
'abbr': _('Abbreviations'),
|
||||||
|
'admonition': _('Support admonitions'),
|
||||||
|
'attr_list': _('Add attribute to HTML tags'),
|
||||||
|
'codehilite': _('Add code highlighting via Pygments'),
|
||||||
|
'def_list': _('Definition lists'),
|
||||||
|
'extra': _('Enables various common extensions'),
|
||||||
|
'fenced_code': _('Alternative code block syntax'),
|
||||||
|
'footnotes': _('Footnotes'),
|
||||||
|
'legacy_attrs': _('Use legacy element attributes'),
|
||||||
|
'legacy_em': _('Use legacy underscore handling for connected words'),
|
||||||
|
'meta': _('Metadata in the document'),
|
||||||
|
'nl2br': _('Treat newlines as hard breaks'),
|
||||||
|
'sane_lists': _('Do not allow mixing list types'),
|
||||||
|
'smarty': _('Use markdown\'s internal smartypants parser'),
|
||||||
|
'tables': _('Support tables'),
|
||||||
|
'toc': _('Generate a table of contents'),
|
||||||
|
'wikilinks': _('Wiki style links'),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class TXTInput(InputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'TXT Input'
|
||||||
|
author = 'John Schember'
|
||||||
|
description = 'Convert TXT files to HTML'
|
||||||
|
file_types = {'txt', 'txtz', 'text', 'md', 'textile', 'markdown'}
|
||||||
|
commit_name = 'txt_input'
|
||||||
|
ui_data = {
|
||||||
|
'md_extensions': MD_EXTENSIONS,
|
||||||
|
'paragraph_types': {
|
||||||
|
'auto': _('Try to auto detect paragraph type'),
|
||||||
|
'block': _('Treat a blank line as a paragraph break'),
|
||||||
|
'single': _('Assume every line is a paragraph'),
|
||||||
|
'print': _('Assume every line starting with 2+ spaces or a tab starts a paragraph'),
|
||||||
|
'unformatted': _('Most lines have hard line breaks, few/no blank lines or indents'),
|
||||||
|
'off': _('Don\'t modify the paragraph structure'),
|
||||||
|
},
|
||||||
|
'formatting_types': {
|
||||||
|
'auto': _('Automatically decide which formatting processor to use'),
|
||||||
|
'plain': _('No formatting'),
|
||||||
|
'heuristic': _('Use heuristics to determine chapter headings, italics, etc.'),
|
||||||
|
'textile': _('Use the TexTile markup language'),
|
||||||
|
'markdown': _('Use the Markdown markup language')
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='formatting_type', recommended_value='auto',
|
||||||
|
choices=list(ui_data['formatting_types']),
|
||||||
|
help=_('Formatting used within the document.\n'
|
||||||
|
'* auto: {auto}\n'
|
||||||
|
'* plain: {plain}\n'
|
||||||
|
'* heuristic: {heuristic}\n'
|
||||||
|
'* textile: {textile}\n'
|
||||||
|
'* markdown: {markdown}\n'
|
||||||
|
'To learn more about markdown see {url}').format(
|
||||||
|
url='https://daringfireball.net/projects/markdown/', **ui_data['formatting_types'])
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='paragraph_type', recommended_value='auto',
|
||||||
|
choices=list(ui_data['paragraph_types']),
|
||||||
|
help=_('Paragraph structure to assume. The value of "off" is useful for formatted documents such as Markdown or Textile. '
|
||||||
|
'Choices are:\n'
|
||||||
|
'* auto: {auto}\n'
|
||||||
|
'* block: {block}\n'
|
||||||
|
'* single: {single}\n'
|
||||||
|
'* print: {print}\n'
|
||||||
|
'* unformatted: {unformatted}\n'
|
||||||
|
'* off: {off}').format(**ui_data['paragraph_types'])
|
||||||
|
),
|
||||||
|
OptionRecommendation(name='preserve_spaces', recommended_value=False,
|
||||||
|
help=_('Normally extra spaces are condensed into a single space. '
|
||||||
|
'With this option all spaces will be displayed.')),
|
||||||
|
OptionRecommendation(name='txt_in_remove_indents', recommended_value=False,
|
||||||
|
help=_('Normally extra space at the beginning of lines is retained. '
|
||||||
|
'With this option they will be removed.')),
|
||||||
|
OptionRecommendation(name="markdown_extensions", recommended_value='footnotes, tables, toc',
|
||||||
|
help=_('Enable extensions to markdown syntax. Extensions are formatting that is not part '
|
||||||
|
'of the standard markdown format. The extensions enabled by default: %default.\n'
|
||||||
|
'To learn more about markdown extensions, see {}\n'
|
||||||
|
'This should be a comma separated list of extensions to enable:\n'
|
||||||
|
).format('https://python-markdown.github.io/extensions/') + '\n'.join('* %s: %s' % (k, MD_EXTENSIONS[k]) for k in sorted(MD_EXTENSIONS))),
|
||||||
|
}
|
||||||
|
|
||||||
|
def shift_file(self, fname, data):
|
||||||
|
name, ext = os.path.splitext(fname)
|
||||||
|
candidate = os.path.join(self.output_dir, fname)
|
||||||
|
c = 0
|
||||||
|
while os.path.exists(candidate):
|
||||||
|
c += 1
|
||||||
|
candidate = os.path.join(self.output_dir, '{}-{}{}'.format(name, c, ext))
|
||||||
|
ans = candidate
|
||||||
|
with open(ans, 'wb') as f:
|
||||||
|
f.write(data)
|
||||||
|
return f.name
|
||||||
|
|
||||||
|
def fix_resources(self, html, base_dir):
|
||||||
|
from html5_parser import parse
|
||||||
|
root = parse(html)
|
||||||
|
changed = False
|
||||||
|
for img in root.xpath('//img[@src]'):
|
||||||
|
src = img.get('src')
|
||||||
|
prefix = src.split(':', 1)[0].lower()
|
||||||
|
if prefix not in ('file', 'http', 'https', 'ftp') and not os.path.isabs(src):
|
||||||
|
src = os.path.join(base_dir, src)
|
||||||
|
if os.access(src, os.R_OK):
|
||||||
|
with open(src, 'rb') as f:
|
||||||
|
data = f.read()
|
||||||
|
f = self.shift_file(os.path.basename(src), data)
|
||||||
|
changed = True
|
||||||
|
img.set('src', os.path.basename(f))
|
||||||
|
if changed:
|
||||||
|
from lxml import etree
|
||||||
|
html = etree.tostring(root, encoding='unicode')
|
||||||
|
return html
|
||||||
|
|
||||||
|
def convert(self, stream, options, file_ext, log,
|
||||||
|
accelerators):
|
||||||
|
from calibre.ebooks.conversion.preprocess import DocAnalysis, Dehyphenator
|
||||||
|
from calibre.ebooks.chardet import detect
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
from calibre.ebooks.txt.processor import (convert_basic,
|
||||||
|
convert_markdown_with_metadata, separate_paragraphs_single_line,
|
||||||
|
separate_paragraphs_print_formatted, preserve_spaces,
|
||||||
|
detect_paragraph_type, detect_formatting_type,
|
||||||
|
normalize_line_endings, convert_textile, remove_indents,
|
||||||
|
block_to_single_line, separate_hard_scene_breaks)
|
||||||
|
|
||||||
|
self.log = log
|
||||||
|
txt = b''
|
||||||
|
log.debug('Reading text from file...')
|
||||||
|
length = 0
|
||||||
|
base_dir = self.output_dir = getcwd()
|
||||||
|
|
||||||
|
# Extract content from zip archive.
|
||||||
|
if file_ext == 'txtz':
|
||||||
|
zf = ZipFile(stream)
|
||||||
|
zf.extractall('.')
|
||||||
|
|
||||||
|
for x in walk('.'):
|
||||||
|
if os.path.splitext(x)[1].lower() in ('.txt', '.text'):
|
||||||
|
with open(x, 'rb') as tf:
|
||||||
|
txt += tf.read() + b'\n\n'
|
||||||
|
else:
|
||||||
|
if getattr(stream, 'name', None):
|
||||||
|
base_dir = os.path.dirname(stream.name)
|
||||||
|
txt = stream.read()
|
||||||
|
if file_ext in {'md', 'textile', 'markdown'}:
|
||||||
|
options.formatting_type = {'md': 'markdown'}.get(file_ext, file_ext)
|
||||||
|
log.info('File extension indicates particular formatting. '
|
||||||
|
'Forcing formatting type to: %s'%options.formatting_type)
|
||||||
|
options.paragraph_type = 'off'
|
||||||
|
|
||||||
|
# Get the encoding of the document.
|
||||||
|
if options.input_encoding:
|
||||||
|
ienc = options.input_encoding
|
||||||
|
log.debug('Using user specified input encoding of %s' % ienc)
|
||||||
|
else:
|
||||||
|
det_encoding = detect(txt[:4096])
|
||||||
|
det_encoding, confidence = det_encoding['encoding'], det_encoding['confidence']
|
||||||
|
if det_encoding and det_encoding.lower().replace('_', '-').strip() in (
|
||||||
|
'gb2312', 'chinese', 'csiso58gb231280', 'euc-cn', 'euccn',
|
||||||
|
'eucgb2312-cn', 'gb2312-1980', 'gb2312-80', 'iso-ir-58'):
|
||||||
|
# Microsoft Word exports to HTML with encoding incorrectly set to
|
||||||
|
# gb2312 instead of gbk. gbk is a superset of gb2312, anyway.
|
||||||
|
det_encoding = 'gbk'
|
||||||
|
ienc = det_encoding
|
||||||
|
log.debug('Detected input encoding as %s with a confidence of %s%%' % (ienc, confidence * 100))
|
||||||
|
if not ienc:
|
||||||
|
ienc = 'utf-8'
|
||||||
|
log.debug('No input encoding specified and could not auto detect using %s' % ienc)
|
||||||
|
# Remove BOM from start of txt as its presence can confuse markdown
|
||||||
|
import codecs
|
||||||
|
for bom in (codecs.BOM_UTF16_LE, codecs.BOM_UTF16_BE, codecs.BOM_UTF8, codecs.BOM_UTF32_LE, codecs.BOM_UTF32_BE):
|
||||||
|
if txt.startswith(bom):
|
||||||
|
txt = txt[len(bom):]
|
||||||
|
break
|
||||||
|
txt = txt.decode(ienc, 'replace')
|
||||||
|
|
||||||
|
# Replace entities
|
||||||
|
txt = _ent_pat.sub(xml_entity_to_unicode, txt)
|
||||||
|
|
||||||
|
# Normalize line endings
|
||||||
|
txt = normalize_line_endings(txt)
|
||||||
|
|
||||||
|
# Determine the paragraph type of the document.
|
||||||
|
if options.paragraph_type == 'auto':
|
||||||
|
options.paragraph_type = detect_paragraph_type(txt)
|
||||||
|
if options.paragraph_type == 'unknown':
|
||||||
|
log.debug('Could not reliably determine paragraph type using block')
|
||||||
|
options.paragraph_type = 'block'
|
||||||
|
else:
|
||||||
|
log.debug('Auto detected paragraph type as %s' % options.paragraph_type)
|
||||||
|
|
||||||
|
# Detect formatting
|
||||||
|
if options.formatting_type == 'auto':
|
||||||
|
options.formatting_type = detect_formatting_type(txt)
|
||||||
|
log.debug('Auto detected formatting as %s' % options.formatting_type)
|
||||||
|
|
||||||
|
if options.formatting_type == 'heuristic':
|
||||||
|
setattr(options, 'enable_heuristics', True)
|
||||||
|
setattr(options, 'unwrap_lines', False)
|
||||||
|
setattr(options, 'smarten_punctuation', True)
|
||||||
|
|
||||||
|
# Reformat paragraphs to block formatting based on the detected type.
|
||||||
|
# We don't check for block because the processor assumes block.
|
||||||
|
# single and print at transformed to block for processing.
|
||||||
|
if options.paragraph_type == 'single':
|
||||||
|
txt = separate_paragraphs_single_line(txt)
|
||||||
|
elif options.paragraph_type == 'print':
|
||||||
|
txt = separate_hard_scene_breaks(txt)
|
||||||
|
txt = separate_paragraphs_print_formatted(txt)
|
||||||
|
txt = block_to_single_line(txt)
|
||||||
|
elif options.paragraph_type == 'unformatted':
|
||||||
|
from calibre.ebooks.conversion.utils import HeuristicProcessor
|
||||||
|
# unwrap lines based on punctuation
|
||||||
|
docanalysis = DocAnalysis('txt', txt)
|
||||||
|
length = docanalysis.line_length(.5)
|
||||||
|
preprocessor = HeuristicProcessor(options, log=getattr(self, 'log', None))
|
||||||
|
txt = preprocessor.punctuation_unwrap(length, txt, 'txt')
|
||||||
|
txt = separate_paragraphs_single_line(txt)
|
||||||
|
elif options.paragraph_type == 'block':
|
||||||
|
txt = separate_hard_scene_breaks(txt)
|
||||||
|
txt = block_to_single_line(txt)
|
||||||
|
|
||||||
|
if getattr(options, 'enable_heuristics', False) and getattr(options, 'dehyphenate', False):
|
||||||
|
docanalysis = DocAnalysis('txt', txt)
|
||||||
|
if not length:
|
||||||
|
length = docanalysis.line_length(.5)
|
||||||
|
dehyphenator = Dehyphenator(options.verbose, log=self.log)
|
||||||
|
txt = dehyphenator(txt,'txt', length)
|
||||||
|
|
||||||
|
# User requested transformation on the text.
|
||||||
|
if options.txt_in_remove_indents:
|
||||||
|
txt = remove_indents(txt)
|
||||||
|
|
||||||
|
# Preserve spaces will replace multiple spaces to a space
|
||||||
|
# followed by the entity.
|
||||||
|
if options.preserve_spaces:
|
||||||
|
txt = preserve_spaces(txt)
|
||||||
|
|
||||||
|
# Process the text using the appropriate text processor.
|
||||||
|
self.shifted_files = []
|
||||||
|
try:
|
||||||
|
html = ''
|
||||||
|
input_mi = None
|
||||||
|
if options.formatting_type == 'markdown':
|
||||||
|
log.debug('Running text through markdown conversion...')
|
||||||
|
try:
|
||||||
|
input_mi, html = convert_markdown_with_metadata(txt, extensions=[x.strip() for x in options.markdown_extensions.split(',') if x.strip()])
|
||||||
|
except RuntimeError:
|
||||||
|
raise ValueError('This txt file has malformed markup, it cannot be'
|
||||||
|
' converted by calibre. See https://daringfireball.net/projects/markdown/syntax')
|
||||||
|
html = self.fix_resources(html, base_dir)
|
||||||
|
elif options.formatting_type == 'textile':
|
||||||
|
log.debug('Running text through textile conversion...')
|
||||||
|
html = convert_textile(txt)
|
||||||
|
html = self.fix_resources(html, base_dir)
|
||||||
|
else:
|
||||||
|
log.debug('Running text through basic conversion...')
|
||||||
|
flow_size = getattr(options, 'flow_size', 0)
|
||||||
|
html = convert_basic(txt, epub_split_size_kb=flow_size)
|
||||||
|
|
||||||
|
# Run the HTMLized text through the html processing plugin.
|
||||||
|
from calibre.customize.ui import plugin_for_input_format
|
||||||
|
html_input = plugin_for_input_format('html')
|
||||||
|
for opt in html_input.options:
|
||||||
|
setattr(options, opt.option.name, opt.recommended_value)
|
||||||
|
options.input_encoding = 'utf-8'
|
||||||
|
htmlfile = self.shift_file('index.html', html.encode('utf-8'))
|
||||||
|
odi = options.debug_pipeline
|
||||||
|
options.debug_pipeline = None
|
||||||
|
# Generate oeb from html conversion.
|
||||||
|
oeb = html_input.convert(open(htmlfile, 'rb'), options, 'html', log, {})
|
||||||
|
options.debug_pipeline = odi
|
||||||
|
finally:
|
||||||
|
for x in self.shifted_files:
|
||||||
|
os.remove(x)
|
||||||
|
|
||||||
|
# Set metadata from file.
|
||||||
|
if input_mi is None:
|
||||||
|
from calibre.customize.ui import get_file_type_metadata
|
||||||
|
input_mi = get_file_type_metadata(stream, file_ext)
|
||||||
|
from calibre.ebooks.oeb.transforms.metadata import meta_info_to_oeb_metadata
|
||||||
|
meta_info_to_oeb_metadata(input_mi, oeb.metadata, log)
|
||||||
|
self.html_postprocess_title = input_mi.title
|
||||||
|
|
||||||
|
return oeb
|
||||||
|
|
||||||
|
def postprocess_book(self, oeb, opts, log):
|
||||||
|
for item in oeb.spine:
|
||||||
|
if hasattr(item.data, 'xpath'):
|
||||||
|
for title in item.data.xpath('//*[local-name()="title"]'):
|
||||||
|
if title.text == _('Unknown'):
|
||||||
|
title.text = self.html_postprocess_title
|
||||||
165
ebook_converter/ebooks/conversion/plugins/txt_output.py
Normal file
165
ebook_converter/ebooks/conversion/plugins/txt_output.py
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL 3'
|
||||||
|
__copyright__ = '2009, John Schember <john@nachtimwald.com>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
|
||||||
|
|
||||||
|
from calibre.customize.conversion import OutputFormatPlugin, \
|
||||||
|
OptionRecommendation
|
||||||
|
from calibre.ptempfile import TemporaryDirectory, TemporaryFile
|
||||||
|
|
||||||
|
NEWLINE_TYPES = ['system', 'unix', 'old_mac', 'windows']
|
||||||
|
|
||||||
|
|
||||||
|
class TXTOutput(OutputFormatPlugin):
|
||||||
|
|
||||||
|
name = 'TXT Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'txt'
|
||||||
|
commit_name = 'txt_output'
|
||||||
|
ui_data = {
|
||||||
|
'newline_types': NEWLINE_TYPES,
|
||||||
|
'formatting_types': {
|
||||||
|
'plain': _('Plain text'),
|
||||||
|
'markdown': _('Markdown formatted text'),
|
||||||
|
'textile': _('TexTile formatted text')
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
options = {
|
||||||
|
OptionRecommendation(name='newline', recommended_value='system',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
short_switch='n', choices=NEWLINE_TYPES,
|
||||||
|
help=_('Type of newline to use. Options are %s. Default is \'system\'. '
|
||||||
|
'Use \'old_mac\' for compatibility with Mac OS 9 and earlier. '
|
||||||
|
'For macOS use \'unix\'. \'system\' will default to the newline '
|
||||||
|
'type used by this OS.') % sorted(NEWLINE_TYPES)),
|
||||||
|
OptionRecommendation(name='txt_output_encoding', recommended_value='utf-8',
|
||||||
|
level=OptionRecommendation.LOW,
|
||||||
|
help=_('Specify the character encoding of the output document. '
|
||||||
|
'The default is utf-8.')),
|
||||||
|
OptionRecommendation(name='inline_toc',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Add Table of Contents to beginning of the book.')),
|
||||||
|
OptionRecommendation(name='max_line_length',
|
||||||
|
recommended_value=0, level=OptionRecommendation.LOW,
|
||||||
|
help=_('The maximum number of characters per line. This splits on '
|
||||||
|
'the first space before the specified value. If no space is found '
|
||||||
|
'the line will be broken at the space after and will exceed the '
|
||||||
|
'specified value. Also, there is a minimum of 25 characters. '
|
||||||
|
'Use 0 to disable line splitting.')),
|
||||||
|
OptionRecommendation(name='force_max_line_length',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Force splitting on the max-line-length value when no space '
|
||||||
|
'is present. Also allows max-line-length to be below the minimum')),
|
||||||
|
OptionRecommendation(name='txt_output_formatting',
|
||||||
|
recommended_value='plain',
|
||||||
|
choices=list(ui_data['formatting_types']),
|
||||||
|
help=_('Formatting used within the document.\n'
|
||||||
|
'* plain: {plain}\n'
|
||||||
|
'* markdown: {markdown}\n'
|
||||||
|
'* textile: {textile}').format(**ui_data['formatting_types'])),
|
||||||
|
OptionRecommendation(name='keep_links',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Do not remove links within the document. This is only '
|
||||||
|
'useful when paired with a txt-output-formatting option that '
|
||||||
|
'is not none because links are always removed with plain text output.')),
|
||||||
|
OptionRecommendation(name='keep_image_references',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Do not remove image references within the document. This is only '
|
||||||
|
'useful when paired with a txt-output-formatting option that '
|
||||||
|
'is not none because links are always removed with plain text output.')),
|
||||||
|
OptionRecommendation(name='keep_color',
|
||||||
|
recommended_value=False, level=OptionRecommendation.LOW,
|
||||||
|
help=_('Do not remove font color from output. This is only useful when '
|
||||||
|
'txt-output-formatting is set to textile. Textile is the only '
|
||||||
|
'formatting that supports setting font color. If this option is '
|
||||||
|
'not specified font color will not be set and default to the '
|
||||||
|
'color displayed by the reader (generally this is black).')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.txt.txtml import TXTMLizer
|
||||||
|
from calibre.utils.cleantext import clean_ascii_chars
|
||||||
|
from calibre.ebooks.txt.newlines import specified_newlines, TxtNewlines
|
||||||
|
|
||||||
|
if opts.txt_output_formatting.lower() == 'markdown':
|
||||||
|
from calibre.ebooks.txt.markdownml import MarkdownMLizer
|
||||||
|
self.writer = MarkdownMLizer(log)
|
||||||
|
elif opts.txt_output_formatting.lower() == 'textile':
|
||||||
|
from calibre.ebooks.txt.textileml import TextileMLizer
|
||||||
|
self.writer = TextileMLizer(log)
|
||||||
|
else:
|
||||||
|
self.writer = TXTMLizer(log)
|
||||||
|
|
||||||
|
txt = self.writer.extract_content(oeb_book, opts)
|
||||||
|
txt = clean_ascii_chars(txt)
|
||||||
|
|
||||||
|
log.debug('\tReplacing newlines with selected type...')
|
||||||
|
txt = specified_newlines(TxtNewlines(opts.newline).newline, txt)
|
||||||
|
|
||||||
|
close = False
|
||||||
|
if not hasattr(output_path, 'write'):
|
||||||
|
close = True
|
||||||
|
if not os.path.exists(os.path.dirname(output_path)) and os.path.dirname(output_path) != '':
|
||||||
|
os.makedirs(os.path.dirname(output_path))
|
||||||
|
out_stream = open(output_path, 'wb')
|
||||||
|
else:
|
||||||
|
out_stream = output_path
|
||||||
|
|
||||||
|
out_stream.seek(0)
|
||||||
|
out_stream.truncate()
|
||||||
|
out_stream.write(txt.encode(opts.txt_output_encoding, 'replace'))
|
||||||
|
|
||||||
|
if close:
|
||||||
|
out_stream.close()
|
||||||
|
|
||||||
|
|
||||||
|
class TXTZOutput(TXTOutput):
|
||||||
|
|
||||||
|
name = 'TXTZ Output'
|
||||||
|
author = 'John Schember'
|
||||||
|
file_type = 'txtz'
|
||||||
|
|
||||||
|
def convert(self, oeb_book, output_path, input_plugin, opts, log):
|
||||||
|
from calibre.ebooks.oeb.base import OEB_IMAGES
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
with TemporaryDirectory('_txtz_output') as tdir:
|
||||||
|
# TXT
|
||||||
|
txt_name = 'index.txt'
|
||||||
|
if opts.txt_output_formatting.lower() == 'textile':
|
||||||
|
txt_name = 'index.text'
|
||||||
|
with TemporaryFile(txt_name) as tf:
|
||||||
|
TXTOutput.convert(self, oeb_book, tf, input_plugin, opts, log)
|
||||||
|
shutil.copy(tf, os.path.join(tdir, txt_name))
|
||||||
|
|
||||||
|
# Images
|
||||||
|
for item in oeb_book.manifest:
|
||||||
|
if item.media_type in OEB_IMAGES:
|
||||||
|
if hasattr(self.writer, 'images'):
|
||||||
|
path = os.path.join(tdir, 'images')
|
||||||
|
if item.href in self.writer.images:
|
||||||
|
href = self.writer.images[item.href]
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
path = os.path.join(tdir, os.path.dirname(item.href))
|
||||||
|
href = os.path.basename(item.href)
|
||||||
|
if not os.path.exists(path):
|
||||||
|
os.makedirs(path)
|
||||||
|
with open(os.path.join(path, href), 'wb') as imgf:
|
||||||
|
imgf.write(item.data)
|
||||||
|
|
||||||
|
# Metadata
|
||||||
|
with open(os.path.join(tdir, 'metadata.opf'), 'wb') as mdataf:
|
||||||
|
mdataf.write(etree.tostring(oeb_book.metadata.to_opf1()))
|
||||||
|
|
||||||
|
txtz = ZipFile(output_path, 'w')
|
||||||
|
txtz.add_dir(tdir)
|
||||||
1330
ebook_converter/ebooks/conversion/plumber.py
Normal file
1330
ebook_converter/ebooks/conversion/plumber.py
Normal file
File diff suppressed because it is too large
Load Diff
646
ebook_converter/ebooks/conversion/preprocess.py
Normal file
646
ebook_converter/ebooks/conversion/preprocess.py
Normal file
@@ -0,0 +1,646 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import functools, re, json
|
||||||
|
from math import ceil
|
||||||
|
|
||||||
|
from calibre import entity_to_unicode, as_unicode
|
||||||
|
from polyglot.builtins import unicode_type, range
|
||||||
|
|
||||||
|
XMLDECL_RE = re.compile(r'^\s*<[?]xml.*?[?]>')
|
||||||
|
SVG_NS = 'http://www.w3.org/2000/svg'
|
||||||
|
XLINK_NS = 'http://www.w3.org/1999/xlink'
|
||||||
|
|
||||||
|
convert_entities = functools.partial(entity_to_unicode,
|
||||||
|
result_exceptions={
|
||||||
|
'<' : '<',
|
||||||
|
'>' : '>',
|
||||||
|
"'" : ''',
|
||||||
|
'"' : '"',
|
||||||
|
'&' : '&',
|
||||||
|
})
|
||||||
|
_span_pat = re.compile('<span.*?</span>', re.DOTALL|re.IGNORECASE)
|
||||||
|
|
||||||
|
LIGATURES = {
|
||||||
|
# '\u00c6': 'AE',
|
||||||
|
# '\u00e6': 'ae',
|
||||||
|
# '\u0152': 'OE',
|
||||||
|
# '\u0153': 'oe',
|
||||||
|
# '\u0132': 'IJ',
|
||||||
|
# '\u0133': 'ij',
|
||||||
|
# '\u1D6B': 'ue',
|
||||||
|
'\uFB00': 'ff',
|
||||||
|
'\uFB01': 'fi',
|
||||||
|
'\uFB02': 'fl',
|
||||||
|
'\uFB03': 'ffi',
|
||||||
|
'\uFB04': 'ffl',
|
||||||
|
'\uFB05': 'ft',
|
||||||
|
'\uFB06': 'st',
|
||||||
|
}
|
||||||
|
|
||||||
|
_ligpat = re.compile('|'.join(LIGATURES))
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_head(match):
|
||||||
|
x = match.group(1)
|
||||||
|
x = _span_pat.sub('', x)
|
||||||
|
return '<head>\n%s\n</head>' % x
|
||||||
|
|
||||||
|
|
||||||
|
def chap_head(match):
|
||||||
|
chap = match.group('chap')
|
||||||
|
title = match.group('title')
|
||||||
|
if not title:
|
||||||
|
return '<h1>'+chap+'</h1><br/>\n'
|
||||||
|
else:
|
||||||
|
return '<h1>'+chap+'</h1>\n<h3>'+title+'</h3>\n'
|
||||||
|
|
||||||
|
|
||||||
|
def wrap_lines(match):
|
||||||
|
ital = match.group('ital')
|
||||||
|
if not ital:
|
||||||
|
return ' '
|
||||||
|
else:
|
||||||
|
return ital+' '
|
||||||
|
|
||||||
|
|
||||||
|
def smarten_punctuation(html, log=None):
|
||||||
|
from calibre.utils.smartypants import smartyPants
|
||||||
|
from calibre.ebooks.chardet import substitute_entites
|
||||||
|
from calibre.ebooks.conversion.utils import HeuristicProcessor
|
||||||
|
preprocessor = HeuristicProcessor(log=log)
|
||||||
|
from uuid import uuid4
|
||||||
|
start = 'calibre-smartypants-'+unicode_type(uuid4())
|
||||||
|
stop = 'calibre-smartypants-'+unicode_type(uuid4())
|
||||||
|
html = html.replace('<!--', start)
|
||||||
|
html = html.replace('-->', stop)
|
||||||
|
html = preprocessor.fix_nbsp_indents(html)
|
||||||
|
html = smartyPants(html)
|
||||||
|
html = html.replace(start, '<!--')
|
||||||
|
html = html.replace(stop, '-->')
|
||||||
|
return substitute_entites(html)
|
||||||
|
|
||||||
|
|
||||||
|
class DocAnalysis(object):
|
||||||
|
'''
|
||||||
|
Provides various text analysis functions to determine how the document is structured.
|
||||||
|
format is the type of document analysis will be done against.
|
||||||
|
raw is the raw text to determine the line length to use for wrapping.
|
||||||
|
Blank lines are excluded from analysis
|
||||||
|
'''
|
||||||
|
|
||||||
|
def __init__(self, format='html', raw=''):
|
||||||
|
raw = raw.replace(' ', ' ')
|
||||||
|
if format == 'html':
|
||||||
|
linere = re.compile(r'(?<=<p)(?![^>]*>\s*</p>).*?(?=</p>)', re.DOTALL)
|
||||||
|
elif format == 'pdf':
|
||||||
|
linere = re.compile(r'(?<=<br>)(?!\s*<br>).*?(?=<br>)', re.DOTALL)
|
||||||
|
elif format == 'spanned_html':
|
||||||
|
linere = re.compile('(?<=<span).*?(?=</span>)', re.DOTALL)
|
||||||
|
elif format == 'txt':
|
||||||
|
linere = re.compile('.*?\n')
|
||||||
|
self.lines = linere.findall(raw)
|
||||||
|
|
||||||
|
def line_length(self, percent):
|
||||||
|
'''
|
||||||
|
Analyses the document to find the median line length.
|
||||||
|
percentage is a decimal number, 0 - 1 which is used to determine
|
||||||
|
how far in the list of line lengths to use. The list of line lengths is
|
||||||
|
ordered smallest to largest and does not include duplicates. 0.5 is the
|
||||||
|
median value.
|
||||||
|
'''
|
||||||
|
lengths = []
|
||||||
|
for line in self.lines:
|
||||||
|
if len(line) > 0:
|
||||||
|
lengths.append(len(line))
|
||||||
|
|
||||||
|
if not lengths:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
lengths = list(set(lengths))
|
||||||
|
total = sum(lengths)
|
||||||
|
avg = total / len(lengths)
|
||||||
|
max_line = ceil(avg * 2)
|
||||||
|
|
||||||
|
lengths = sorted(lengths)
|
||||||
|
for i in range(len(lengths) - 1, -1, -1):
|
||||||
|
if lengths[i] > max_line:
|
||||||
|
del lengths[i]
|
||||||
|
|
||||||
|
if percent > 1:
|
||||||
|
percent = 1
|
||||||
|
if percent < 0:
|
||||||
|
percent = 0
|
||||||
|
|
||||||
|
index = int(len(lengths) * percent) - 1
|
||||||
|
|
||||||
|
return lengths[index]
|
||||||
|
|
||||||
|
def line_histogram(self, percent):
|
||||||
|
'''
|
||||||
|
Creates a broad histogram of the document to determine whether it incorporates hard
|
||||||
|
line breaks. Lines are sorted into 20 'buckets' based on length.
|
||||||
|
percent is the percentage of lines that should be in a single bucket to return true
|
||||||
|
The majority of the lines will exist in 1-2 buckets in typical docs with hard line breaks
|
||||||
|
'''
|
||||||
|
minLineLength=20 # Ignore lines under 20 chars (typical of spaces)
|
||||||
|
maxLineLength=1900 # Discard larger than this to stay in range
|
||||||
|
buckets=20 # Each line is divided into a bucket based on length
|
||||||
|
|
||||||
|
# print("there are "+unicode_type(len(lines))+" lines")
|
||||||
|
# max = 0
|
||||||
|
# for line in self.lines:
|
||||||
|
# l = len(line)
|
||||||
|
# if l > max:
|
||||||
|
# max = l
|
||||||
|
# print("max line found is "+unicode_type(max))
|
||||||
|
# Build the line length histogram
|
||||||
|
hRaw = [0 for i in range(0,buckets)]
|
||||||
|
for line in self.lines:
|
||||||
|
l = len(line)
|
||||||
|
if l > minLineLength and l < maxLineLength:
|
||||||
|
l = int(l // 100)
|
||||||
|
# print("adding "+unicode_type(l))
|
||||||
|
hRaw[l]+=1
|
||||||
|
|
||||||
|
# Normalize the histogram into percents
|
||||||
|
totalLines = len(self.lines)
|
||||||
|
if totalLines > 0:
|
||||||
|
h = [float(count)/totalLines for count in hRaw]
|
||||||
|
else:
|
||||||
|
h = []
|
||||||
|
# print("\nhRaw histogram lengths are: "+unicode_type(hRaw))
|
||||||
|
# print(" percents are: "+unicode_type(h)+"\n")
|
||||||
|
|
||||||
|
# Find the biggest bucket
|
||||||
|
maxValue = 0
|
||||||
|
for i in range(0,len(h)):
|
||||||
|
if h[i] > maxValue:
|
||||||
|
maxValue = h[i]
|
||||||
|
|
||||||
|
if maxValue < percent:
|
||||||
|
# print("Line lengths are too variable. Not unwrapping.")
|
||||||
|
return False
|
||||||
|
else:
|
||||||
|
# print(unicode_type(maxValue)+" of the lines were in one bucket")
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
class Dehyphenator(object):
|
||||||
|
'''
|
||||||
|
Analyzes words to determine whether hyphens should be retained/removed. Uses the document
|
||||||
|
itself is as a dictionary. This method handles all languages along with uncommon, made-up, and
|
||||||
|
scientific words. The primary disadvantage is that words appearing only once in the document
|
||||||
|
retain hyphens.
|
||||||
|
'''
|
||||||
|
|
||||||
|
def __init__(self, verbose=0, log=None):
|
||||||
|
self.log = log
|
||||||
|
self.verbose = verbose
|
||||||
|
# Add common suffixes to the regex below to increase the likelihood of a match -
|
||||||
|
# don't add suffixes which are also complete words, such as 'able' or 'sex'
|
||||||
|
# only remove if it's not already the point of hyphenation
|
||||||
|
self.suffix_string = (
|
||||||
|
"((ed)?ly|'?e?s||a?(t|s)?ion(s|al(ly)?)?|ings?|er|(i)?ous|"
|
||||||
|
"(i|a)ty|(it)?ies|ive|gence|istic(ally)?|(e|a)nce|m?ents?|ism|ated|"
|
||||||
|
"(e|u)ct(ed)?|ed|(i|ed)?ness|(e|a)ncy|ble|ier|al|ex|ian)$")
|
||||||
|
self.suffixes = re.compile(r"^%s" % self.suffix_string, re.IGNORECASE)
|
||||||
|
self.removesuffixes = re.compile(r"%s" % self.suffix_string, re.IGNORECASE)
|
||||||
|
# remove prefixes if the prefix was not already the point of hyphenation
|
||||||
|
self.prefix_string = '^(dis|re|un|in|ex)'
|
||||||
|
self.prefixes = re.compile(r'%s$' % self.prefix_string, re.IGNORECASE)
|
||||||
|
self.removeprefix = re.compile(r'%s' % self.prefix_string, re.IGNORECASE)
|
||||||
|
|
||||||
|
def dehyphenate(self, match):
|
||||||
|
firsthalf = match.group('firstpart')
|
||||||
|
secondhalf = match.group('secondpart')
|
||||||
|
try:
|
||||||
|
wraptags = match.group('wraptags')
|
||||||
|
except:
|
||||||
|
wraptags = ''
|
||||||
|
hyphenated = unicode_type(firsthalf) + "-" + unicode_type(secondhalf)
|
||||||
|
dehyphenated = unicode_type(firsthalf) + unicode_type(secondhalf)
|
||||||
|
if self.suffixes.match(secondhalf) is None:
|
||||||
|
lookupword = self.removesuffixes.sub('', dehyphenated)
|
||||||
|
else:
|
||||||
|
lookupword = dehyphenated
|
||||||
|
if len(firsthalf) > 4 and self.prefixes.match(firsthalf) is None:
|
||||||
|
lookupword = self.removeprefix.sub('', lookupword)
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log("lookup word is: "+lookupword+", orig is: " + hyphenated)
|
||||||
|
try:
|
||||||
|
searchresult = self.html.find(lookupword.lower())
|
||||||
|
except:
|
||||||
|
return hyphenated
|
||||||
|
if self.format == 'html_cleanup' or self.format == 'txt_cleanup':
|
||||||
|
if self.html.find(lookupword) != -1 or searchresult != -1:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log(" Cleanup:returned dehyphenated word: " + dehyphenated)
|
||||||
|
return dehyphenated
|
||||||
|
elif self.html.find(hyphenated) != -1:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log(" Cleanup:returned hyphenated word: " + hyphenated)
|
||||||
|
return hyphenated
|
||||||
|
else:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log(" Cleanup:returning original text "+firsthalf+" + linefeed "+secondhalf)
|
||||||
|
return firsthalf+'\u2014'+wraptags+secondhalf
|
||||||
|
|
||||||
|
else:
|
||||||
|
if self.format == 'individual_words' and len(firsthalf) + len(secondhalf) <= 6:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log("too short, returned hyphenated word: " + hyphenated)
|
||||||
|
return hyphenated
|
||||||
|
if len(firsthalf) <= 2 and len(secondhalf) <= 2:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log("too short, returned hyphenated word: " + hyphenated)
|
||||||
|
return hyphenated
|
||||||
|
if self.html.find(lookupword) != -1 or searchresult != -1:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log(" returned dehyphenated word: " + dehyphenated)
|
||||||
|
return dehyphenated
|
||||||
|
else:
|
||||||
|
if self.verbose > 2:
|
||||||
|
self.log(" returned hyphenated word: " + hyphenated)
|
||||||
|
return hyphenated
|
||||||
|
|
||||||
|
def __call__(self, html, format, length=1):
|
||||||
|
self.html = html
|
||||||
|
self.format = format
|
||||||
|
if format == 'html':
|
||||||
|
intextmatch = re.compile((
|
||||||
|
r'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)\s*(?=<)(?P<wraptags>(</span>)?'
|
||||||
|
r'\s*(</[iubp]>\s*){1,2}(?P<up2threeblanks><(p|div)[^>]*>\s*(<p[^>]*>\s*</p>\s*)'
|
||||||
|
r'?</(p|div)>\s+){0,3}\s*(<[iubp][^>]*>\s*){1,2}(<span[^>]*>)?)\s*(?P<secondpart>[\w\d]+)') % length)
|
||||||
|
elif format == 'pdf':
|
||||||
|
intextmatch = re.compile((
|
||||||
|
r'(?<=.{%i})(?P<firstpart>[^\W\-]+)(-|‐)\s*(?P<wraptags><p>|'
|
||||||
|
r'</[iub]>\s*<p>\s*<[iub]>)\s*(?P<secondpart>[\w\d]+)')% length)
|
||||||
|
elif format == 'txt':
|
||||||
|
intextmatch = re.compile(
|
||||||
|
'(?<=.{%i})(?P<firstpart>[^\\W\\-]+)(-|‐)(\u0020|\u0009)*(?P<wraptags>(\n(\u0020|\u0009)*)+)(?P<secondpart>[\\w\\d]+)'% length)
|
||||||
|
elif format == 'individual_words':
|
||||||
|
intextmatch = re.compile(
|
||||||
|
r'(?!<)(?P<firstpart>[^\W\-]+)(-|‐)\s*(?P<secondpart>\w+)(?![^<]*?>)', re.UNICODE)
|
||||||
|
elif format == 'html_cleanup':
|
||||||
|
intextmatch = re.compile(
|
||||||
|
r'(?P<firstpart>[^\W\-]+)(-|‐)\s*(?=<)(?P<wraptags></span>\s*(</[iubp]>'
|
||||||
|
r'\s*<[iubp][^>]*>\s*)?<span[^>]*>|</[iubp]>\s*<[iubp][^>]*>)?\s*(?P<secondpart>[\w\d]+)')
|
||||||
|
elif format == 'txt_cleanup':
|
||||||
|
intextmatch = re.compile(
|
||||||
|
r'(?P<firstpart>[^\W\-]+)(-|‐)(?P<wraptags>\s+)(?P<secondpart>[\w\d]+)')
|
||||||
|
|
||||||
|
html = intextmatch.sub(self.dehyphenate, html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
|
||||||
|
class CSSPreProcessor(object):
|
||||||
|
|
||||||
|
# Remove some of the broken CSS Microsoft products
|
||||||
|
# create
|
||||||
|
MS_PAT = re.compile(r'''
|
||||||
|
(?P<start>^|;|\{)\s* # The end of the previous rule or block start
|
||||||
|
(%s).+? # The invalid selectors
|
||||||
|
(?P<end>$|;|\}) # The end of the declaration
|
||||||
|
'''%'mso-|panose-|text-underline|tab-interval',
|
||||||
|
re.MULTILINE|re.IGNORECASE|re.VERBOSE)
|
||||||
|
|
||||||
|
def ms_sub(self, match):
|
||||||
|
end = match.group('end')
|
||||||
|
try:
|
||||||
|
start = match.group('start')
|
||||||
|
except:
|
||||||
|
start = ''
|
||||||
|
if end == ';':
|
||||||
|
end = ''
|
||||||
|
return start + end
|
||||||
|
|
||||||
|
def __call__(self, data, add_namespace=False):
|
||||||
|
from calibre.ebooks.oeb.base import XHTML_CSS_NAMESPACE
|
||||||
|
data = self.MS_PAT.sub(self.ms_sub, data)
|
||||||
|
if not add_namespace:
|
||||||
|
return data
|
||||||
|
|
||||||
|
# Remove comments as the following namespace logic will break if there
|
||||||
|
# are commented lines before the first @import or @charset rule. Since
|
||||||
|
# the conversion will remove all stylesheets anyway, we don't lose
|
||||||
|
# anything
|
||||||
|
data = re.sub(unicode_type(r'/\*.*?\*/'), '', data, flags=re.DOTALL)
|
||||||
|
|
||||||
|
ans, namespaced = [], False
|
||||||
|
for line in data.splitlines():
|
||||||
|
ll = line.lstrip()
|
||||||
|
if not (namespaced or ll.startswith('@import') or not ll or
|
||||||
|
ll.startswith('@charset')):
|
||||||
|
ans.append(XHTML_CSS_NAMESPACE.strip())
|
||||||
|
namespaced = True
|
||||||
|
ans.append(line)
|
||||||
|
|
||||||
|
return '\n'.join(ans)
|
||||||
|
|
||||||
|
|
||||||
|
def accent_regex(accent_maps, letter_before=False):
|
||||||
|
accent_cat = set()
|
||||||
|
letters = set()
|
||||||
|
|
||||||
|
for accent in tuple(accent_maps):
|
||||||
|
accent_cat.add(accent)
|
||||||
|
k, v = accent_maps[accent].split(':', 1)
|
||||||
|
if len(k) != len(v):
|
||||||
|
raise ValueError('Invalid mapping for: {} -> {}'.format(k, v))
|
||||||
|
accent_maps[accent] = lmap = dict(zip(k, v))
|
||||||
|
letters |= set(lmap)
|
||||||
|
|
||||||
|
if letter_before:
|
||||||
|
args = ''.join(letters), ''.join(accent_cat)
|
||||||
|
accent_group, letter_group = 2, 1
|
||||||
|
else:
|
||||||
|
args = ''.join(accent_cat), ''.join(letters)
|
||||||
|
accent_group, letter_group = 1, 2
|
||||||
|
|
||||||
|
pat = re.compile(r'([{}])\s*(?:<br[^>]*>){{0,1}}\s*([{}])'.format(*args), re.UNICODE)
|
||||||
|
|
||||||
|
def sub(m):
|
||||||
|
lmap = accent_maps[m.group(accent_group)]
|
||||||
|
return lmap.get(m.group(letter_group)) or m.group()
|
||||||
|
|
||||||
|
return pat, sub
|
||||||
|
|
||||||
|
|
||||||
|
def html_preprocess_rules():
|
||||||
|
ans = getattr(html_preprocess_rules, 'ans', None)
|
||||||
|
if ans is None:
|
||||||
|
ans = html_preprocess_rules.ans = [
|
||||||
|
# Remove huge block of contiguous spaces as they slow down
|
||||||
|
# the following regexes pretty badly
|
||||||
|
(re.compile(r'\s{10000,}'), ''),
|
||||||
|
# Some idiotic HTML generators (Frontpage I'm looking at you)
|
||||||
|
# Put all sorts of crap into <head>. This messes up lxml
|
||||||
|
(re.compile(r'<head[^>]*>\n*(.*?)\n*</head>', re.IGNORECASE|re.DOTALL),
|
||||||
|
sanitize_head),
|
||||||
|
# Convert all entities, since lxml doesn't handle them well
|
||||||
|
(re.compile(r'&(\S+?);'), convert_entities),
|
||||||
|
# Remove the <![if/endif tags inserted by everybody's darling, MS Word
|
||||||
|
(re.compile(r'</{0,1}!\[(end){0,1}if\]{0,1}>', re.IGNORECASE), ''),
|
||||||
|
]
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def pdftohtml_rules():
|
||||||
|
ans = getattr(pdftohtml_rules, 'ans', None)
|
||||||
|
if ans is None:
|
||||||
|
ans = pdftohtml_rules.ans = [
|
||||||
|
accent_regex({
|
||||||
|
'¨': 'aAeEiIoOuU:äÄëËïÏöÖüÜ',
|
||||||
|
'`': 'aAeEiIoOuU:àÀèÈìÌòÒùÙ',
|
||||||
|
'´': 'aAcCeEiIlLoOnNrRsSuUzZ:áÁćĆéÉíÍĺĹóÓńŃŕŔśŚúÚźŹ',
|
||||||
|
'ˆ': 'aAeEiIoOuU:âÂêÊîÎôÔûÛ',
|
||||||
|
'¸': 'cC:çÇ',
|
||||||
|
'˛': 'aAeE:ąĄęĘ',
|
||||||
|
'˙': 'zZ:żŻ',
|
||||||
|
'ˇ': 'cCdDeElLnNrRsStTzZ:čČďĎěĚľĽňŇřŘšŠťŤžŽ',
|
||||||
|
'°': 'uU:ůŮ',
|
||||||
|
}),
|
||||||
|
|
||||||
|
accent_regex({'`': 'aAeEiIoOuU:àÀèÈìÌòÒùÙ'}, letter_before=True),
|
||||||
|
|
||||||
|
# If pdf printed from a browser then the header/footer has a reliable pattern
|
||||||
|
(re.compile(r'((?<=</a>)\s*file:/{2,4}[A-Z].*<br>|file:////?[A-Z].*<br>(?=\s*<hr>))', re.IGNORECASE), lambda match: ''),
|
||||||
|
|
||||||
|
# Center separator lines
|
||||||
|
(re.compile(r'<br>\s*(?P<break>([*#•✦=] *){3,})\s*<br>'), lambda match: '<p>\n<p style="text-align:center">' + match.group('break') + '</p>'),
|
||||||
|
|
||||||
|
# Remove <hr> tags
|
||||||
|
(re.compile(r'<hr.*?>', re.IGNORECASE), ''),
|
||||||
|
|
||||||
|
# Remove gray background
|
||||||
|
(re.compile(r'<BODY[^<>]+>'), '<BODY>'),
|
||||||
|
|
||||||
|
# Convert line breaks to paragraphs
|
||||||
|
(re.compile(r'<br[^>]*>\s*'), '</p>\n<p>'),
|
||||||
|
(re.compile(r'<body[^>]*>\s*'), '<body>\n<p>'),
|
||||||
|
(re.compile(r'\s*</body>'), '</p>\n</body>'),
|
||||||
|
|
||||||
|
# Clean up spaces
|
||||||
|
(re.compile(r'(?<=[\.,;\?!”"\'])[\s^ ]*(?=<)'), ' '),
|
||||||
|
# Add space before and after italics
|
||||||
|
(re.compile(r'(?<!“)<i>'), ' <i>'),
|
||||||
|
(re.compile(r'</i>(?=\w)'), '</i> '),
|
||||||
|
]
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def book_designer_rules():
|
||||||
|
ans = getattr(book_designer_rules, 'ans', None)
|
||||||
|
if ans is None:
|
||||||
|
ans = book_designer_rules.ans = [
|
||||||
|
# HR
|
||||||
|
(re.compile('<hr>', re.IGNORECASE),
|
||||||
|
lambda match : '<span style="page-break-after:always"> </span>'),
|
||||||
|
# Create header tags
|
||||||
|
(re.compile(r'<h2[^><]*?id=BookTitle[^><]*?(align=)*(?(1)(\w+))*[^><]*?>[^><]*?</h2>', re.IGNORECASE),
|
||||||
|
lambda match : '<h1 id="BookTitle" align="%s">%s</h1>'%(match.group(2) if match.group(2) else 'center', match.group(3))),
|
||||||
|
(re.compile(r'<h2[^><]*?id=BookAuthor[^><]*?(align=)*(?(1)(\w+))*[^><]*?>[^><]*?</h2>', re.IGNORECASE),
|
||||||
|
lambda match : '<h2 id="BookAuthor" align="%s">%s</h2>'%(match.group(2) if match.group(2) else 'center', match.group(3))),
|
||||||
|
(re.compile('<span[^><]*?id=title[^><]*?>(.*?)</span>', re.IGNORECASE|re.DOTALL),
|
||||||
|
lambda match : '<h2 class="title">%s</h2>'%(match.group(1),)),
|
||||||
|
(re.compile('<span[^><]*?id=subtitle[^><]*?>(.*?)</span>', re.IGNORECASE|re.DOTALL),
|
||||||
|
lambda match : '<h3 class="subtitle">%s</h3>'%(match.group(1),)),
|
||||||
|
]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class HTMLPreProcessor(object):
|
||||||
|
|
||||||
|
def __init__(self, log=None, extra_opts=None, regex_wizard_callback=None):
|
||||||
|
self.log = log
|
||||||
|
self.extra_opts = extra_opts
|
||||||
|
self.regex_wizard_callback = regex_wizard_callback
|
||||||
|
self.current_href = None
|
||||||
|
|
||||||
|
def is_baen(self, src):
|
||||||
|
return re.compile(r'<meta\s+name="Publisher"\s+content=".*?Baen.*?"',
|
||||||
|
re.IGNORECASE).search(src) is not None
|
||||||
|
|
||||||
|
def is_book_designer(self, raw):
|
||||||
|
return re.search('<H2[^><]*id=BookTitle', raw) is not None
|
||||||
|
|
||||||
|
def is_pdftohtml(self, src):
|
||||||
|
return '<!-- created by calibre\'s pdftohtml -->' in src[:1000]
|
||||||
|
|
||||||
|
def __call__(self, html, remove_special_chars=None,
|
||||||
|
get_preprocess_html=False):
|
||||||
|
if remove_special_chars is not None:
|
||||||
|
html = remove_special_chars.sub('', html)
|
||||||
|
html = html.replace('\0', '')
|
||||||
|
is_pdftohtml = self.is_pdftohtml(html)
|
||||||
|
if self.is_baen(html):
|
||||||
|
rules = []
|
||||||
|
elif self.is_book_designer(html):
|
||||||
|
rules = book_designer_rules()
|
||||||
|
elif is_pdftohtml:
|
||||||
|
rules = pdftohtml_rules()
|
||||||
|
else:
|
||||||
|
rules = []
|
||||||
|
|
||||||
|
start_rules = []
|
||||||
|
|
||||||
|
if not getattr(self.extra_opts, 'keep_ligatures', False):
|
||||||
|
html = _ligpat.sub(lambda m:LIGATURES[m.group()], html)
|
||||||
|
|
||||||
|
user_sr_rules = {}
|
||||||
|
# Function for processing search and replace
|
||||||
|
|
||||||
|
def do_search_replace(search_pattern, replace_txt):
|
||||||
|
from calibre.ebooks.conversion.search_replace import compile_regular_expression
|
||||||
|
try:
|
||||||
|
search_re = compile_regular_expression(search_pattern)
|
||||||
|
if not replace_txt:
|
||||||
|
replace_txt = ''
|
||||||
|
rules.insert(0, (search_re, replace_txt))
|
||||||
|
user_sr_rules[(search_re, replace_txt)] = search_pattern
|
||||||
|
except Exception as e:
|
||||||
|
self.log.error('Failed to parse %r regexp because %s' %
|
||||||
|
(search, as_unicode(e)))
|
||||||
|
|
||||||
|
# search / replace using the sr?_search / sr?_replace options
|
||||||
|
for i in range(1, 4):
|
||||||
|
search, replace = 'sr%d_search'%i, 'sr%d_replace'%i
|
||||||
|
search_pattern = getattr(self.extra_opts, search, '')
|
||||||
|
replace_txt = getattr(self.extra_opts, replace, '')
|
||||||
|
if search_pattern:
|
||||||
|
do_search_replace(search_pattern, replace_txt)
|
||||||
|
|
||||||
|
# multi-search / replace using the search_replace option
|
||||||
|
search_replace = getattr(self.extra_opts, 'search_replace', None)
|
||||||
|
if search_replace:
|
||||||
|
search_replace = json.loads(search_replace)
|
||||||
|
for search_pattern, replace_txt in reversed(search_replace):
|
||||||
|
do_search_replace(search_pattern, replace_txt)
|
||||||
|
|
||||||
|
end_rules = []
|
||||||
|
# delete soft hyphens - moved here so it's executed after header/footer removal
|
||||||
|
if is_pdftohtml:
|
||||||
|
# unwrap/delete soft hyphens
|
||||||
|
end_rules.append((re.compile(
|
||||||
|
r'[](</p>\s*<p>\s*)+\s*(?=[\[a-z\d])'), lambda match: ''))
|
||||||
|
# unwrap/delete soft hyphens with formatting
|
||||||
|
end_rules.append((re.compile(
|
||||||
|
r'[]\s*(</(i|u|b)>)+(</p>\s*<p>\s*)+\s*(<(i|u|b)>)+\s*(?=[\[a-z\d])'), lambda match: ''))
|
||||||
|
|
||||||
|
length = -1
|
||||||
|
if getattr(self.extra_opts, 'unwrap_factor', 0.0) > 0.01:
|
||||||
|
docanalysis = DocAnalysis('pdf', html)
|
||||||
|
length = docanalysis.line_length(getattr(self.extra_opts, 'unwrap_factor'))
|
||||||
|
if length:
|
||||||
|
# print("The pdf line length returned is " + unicode_type(length))
|
||||||
|
# unwrap em/en dashes
|
||||||
|
end_rules.append((re.compile(
|
||||||
|
r'(?<=.{%i}[–—])\s*<p>\s*(?=[\[a-z\d])' % length), lambda match: ''))
|
||||||
|
end_rules.append(
|
||||||
|
# Un wrap using punctuation
|
||||||
|
(re.compile((
|
||||||
|
r'(?<=.{%i}([a-zäëïöüàèìòùáćéíĺóŕńśúýâêîôûçąężıãõñæøþðßěľščťžňďřů,:)\\IAß]'
|
||||||
|
r'|(?<!\&\w{4});))\s*(?P<ital></(i|b|u)>)?\s*(</p>\s*<p>\s*)+\s*(?=(<(i|b|u)>)?'
|
||||||
|
r'\s*[\w\d$(])') % length, re.UNICODE), wrap_lines),
|
||||||
|
)
|
||||||
|
|
||||||
|
for rule in html_preprocess_rules() + start_rules:
|
||||||
|
html = rule[0].sub(rule[1], html)
|
||||||
|
|
||||||
|
if self.regex_wizard_callback is not None:
|
||||||
|
self.regex_wizard_callback(self.current_href, html)
|
||||||
|
|
||||||
|
if get_preprocess_html:
|
||||||
|
return html
|
||||||
|
|
||||||
|
def dump(raw, where):
|
||||||
|
import os
|
||||||
|
dp = getattr(self.extra_opts, 'debug_pipeline', None)
|
||||||
|
if dp and os.path.exists(dp):
|
||||||
|
odir = os.path.join(dp, 'input')
|
||||||
|
if os.path.exists(odir):
|
||||||
|
odir = os.path.join(odir, where)
|
||||||
|
if not os.path.exists(odir):
|
||||||
|
os.makedirs(odir)
|
||||||
|
name, i = None, 0
|
||||||
|
while not name or os.path.exists(os.path.join(odir, name)):
|
||||||
|
i += 1
|
||||||
|
name = '%04d.html'%i
|
||||||
|
with open(os.path.join(odir, name), 'wb') as f:
|
||||||
|
f.write(raw.encode('utf-8'))
|
||||||
|
|
||||||
|
# dump(html, 'pre-preprocess')
|
||||||
|
|
||||||
|
for rule in rules + end_rules:
|
||||||
|
try:
|
||||||
|
html = rule[0].sub(rule[1], html)
|
||||||
|
except Exception as e:
|
||||||
|
if rule in user_sr_rules:
|
||||||
|
self.log.error(
|
||||||
|
'User supplied search & replace rule: %s -> %s '
|
||||||
|
'failed with error: %s, ignoring.'%(
|
||||||
|
user_sr_rules[rule], rule[1], e))
|
||||||
|
else:
|
||||||
|
raise
|
||||||
|
|
||||||
|
if is_pdftohtml and length > -1:
|
||||||
|
# Dehyphenate
|
||||||
|
dehyphenator = Dehyphenator(self.extra_opts.verbose, self.log)
|
||||||
|
html = dehyphenator(html,'html', length)
|
||||||
|
|
||||||
|
if is_pdftohtml:
|
||||||
|
from calibre.ebooks.conversion.utils import HeuristicProcessor
|
||||||
|
pdf_markup = HeuristicProcessor(self.extra_opts, None)
|
||||||
|
totalwords = 0
|
||||||
|
if pdf_markup.get_word_count(html) > 7000:
|
||||||
|
html = pdf_markup.markup_chapters(html, totalwords, True)
|
||||||
|
|
||||||
|
# dump(html, 'post-preprocess')
|
||||||
|
|
||||||
|
# Handle broken XHTML w/ SVG (ugh)
|
||||||
|
if 'svg:' in html and SVG_NS not in html:
|
||||||
|
html = html.replace(
|
||||||
|
'<html', '<html xmlns:svg="%s"' % SVG_NS, 1)
|
||||||
|
if 'xlink:' in html and XLINK_NS not in html:
|
||||||
|
html = html.replace(
|
||||||
|
'<html', '<html xmlns:xlink="%s"' % XLINK_NS, 1)
|
||||||
|
|
||||||
|
html = XMLDECL_RE.sub('', html)
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'asciiize', False):
|
||||||
|
from calibre.utils.localization import get_udc
|
||||||
|
from calibre.utils.mreplace import MReplace
|
||||||
|
unihandecoder = get_udc()
|
||||||
|
mr = MReplace(data={'«':'<'*3, '»':'>'*3})
|
||||||
|
html = mr.mreplace(html)
|
||||||
|
html = unihandecoder.decode(html)
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'enable_heuristics', False):
|
||||||
|
from calibre.ebooks.conversion.utils import HeuristicProcessor
|
||||||
|
preprocessor = HeuristicProcessor(self.extra_opts, self.log)
|
||||||
|
html = preprocessor(html)
|
||||||
|
|
||||||
|
if is_pdftohtml:
|
||||||
|
html = html.replace('<!-- created by calibre\'s pdftohtml -->', '')
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'smarten_punctuation', False):
|
||||||
|
html = smarten_punctuation(html, self.log)
|
||||||
|
|
||||||
|
try:
|
||||||
|
unsupported_unicode_chars = self.extra_opts.output_profile.unsupported_unicode_chars
|
||||||
|
except AttributeError:
|
||||||
|
unsupported_unicode_chars = ''
|
||||||
|
if unsupported_unicode_chars:
|
||||||
|
from calibre.utils.localization import get_udc
|
||||||
|
unihandecoder = get_udc()
|
||||||
|
for char in unsupported_unicode_chars:
|
||||||
|
asciichar = unihandecoder.decode(char)
|
||||||
|
html = html.replace(char, asciichar)
|
||||||
|
|
||||||
|
return html
|
||||||
881
ebook_converter/ebooks/conversion/utils.py
Normal file
881
ebook_converter/ebooks/conversion/utils.py
Normal file
@@ -0,0 +1,881 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2010, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import re
|
||||||
|
from math import ceil
|
||||||
|
from calibre.ebooks.conversion.preprocess import DocAnalysis, Dehyphenator
|
||||||
|
from calibre.utils.logging import default_log
|
||||||
|
from calibre.utils.wordcount import get_wordcount_obj
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class HeuristicProcessor(object):
|
||||||
|
|
||||||
|
def __init__(self, extra_opts=None, log=None):
|
||||||
|
self.log = default_log if log is None else log
|
||||||
|
self.html_preprocess_sections = 0
|
||||||
|
self.found_indents = 0
|
||||||
|
self.extra_opts = extra_opts
|
||||||
|
self.deleted_nbsps = False
|
||||||
|
self.totalwords = 0
|
||||||
|
self.min_chapters = 1
|
||||||
|
self.chapters_no_title = 0
|
||||||
|
self.chapters_with_title = 0
|
||||||
|
self.blanks_deleted = False
|
||||||
|
self.blanks_between_paragraphs = False
|
||||||
|
self.linereg = re.compile('(?<=<p).*?(?=</p>)', re.IGNORECASE|re.DOTALL)
|
||||||
|
self.blankreg = re.compile(r'\s*(?P<openline><p(?!\sclass=\"(softbreak|whitespace)\")[^>]*>)\s*(?P<closeline></p>)', re.IGNORECASE)
|
||||||
|
self.anyblank = re.compile(r'\s*(?P<openline><p[^>]*>)\s*(?P<closeline></p>)', re.IGNORECASE)
|
||||||
|
self.multi_blank = re.compile(r'(\s*<p[^>]*>\s*</p>(\s*<div[^>]*>\s*</div>\s*)*){2,}(?!\s*<h\d)', re.IGNORECASE)
|
||||||
|
self.any_multi_blank = re.compile(r'(\s*<p[^>]*>\s*</p>(\s*<div[^>]*>\s*</div>\s*)*){2,}', re.IGNORECASE)
|
||||||
|
self.line_open = (
|
||||||
|
r"<(?P<outer>p|div)[^>]*>\s*(<(?P<inner1>font|span|[ibu])[^>]*>)?\s*"
|
||||||
|
r"(<(?P<inner2>font|span|[ibu])[^>]*>)?\s*(<(?P<inner3>font|span|[ibu])[^>]*>)?\s*")
|
||||||
|
self.line_close = "(</(?P=inner3)>)?\\s*(</(?P=inner2)>)?\\s*(</(?P=inner1)>)?\\s*</(?P=outer)>"
|
||||||
|
self.single_blank = re.compile(r'(\s*<(p|div)[^>]*>\s*</(p|div)>)', re.IGNORECASE)
|
||||||
|
self.scene_break_open = '<p class="scenebreak" style="text-align:center; text-indent:0%; margin-top:1em; margin-bottom:1em; page-break-before:avoid">'
|
||||||
|
self.common_in_text_endings = '[\"\'—’”,\\.!\\?\\…\\)„\\w]'
|
||||||
|
self.common_in_text_beginnings = '[\\w\'\"“‘‛]'
|
||||||
|
|
||||||
|
def is_pdftohtml(self, src):
|
||||||
|
return '<!-- created by calibre\'s pdftohtml -->' in src[:1000]
|
||||||
|
|
||||||
|
def is_abbyy(self, src):
|
||||||
|
return '<meta name="generator" content="ABBYY FineReader' in src[:1000]
|
||||||
|
|
||||||
|
def chapter_head(self, match):
|
||||||
|
from calibre.utils.html2text import html2text
|
||||||
|
chap = match.group('chap')
|
||||||
|
title = match.group('title')
|
||||||
|
if not title:
|
||||||
|
self.html_preprocess_sections = self.html_preprocess_sections + 1
|
||||||
|
self.log.debug("marked " + unicode_type(self.html_preprocess_sections) +
|
||||||
|
" chapters. - " + unicode_type(chap))
|
||||||
|
return '<h2>'+chap+'</h2>\n'
|
||||||
|
else:
|
||||||
|
delete_whitespace = re.compile('^\\s*(?P<c>.*?)\\s*$')
|
||||||
|
delete_quotes = re.compile('\'\"')
|
||||||
|
txt_chap = delete_quotes.sub('', delete_whitespace.sub('\\g<c>', html2text(chap)))
|
||||||
|
txt_title = delete_quotes.sub('', delete_whitespace.sub('\\g<c>', html2text(title)))
|
||||||
|
self.html_preprocess_sections = self.html_preprocess_sections + 1
|
||||||
|
self.log.debug("marked " + unicode_type(self.html_preprocess_sections) +
|
||||||
|
" chapters & titles. - " + unicode_type(chap) + ", " + unicode_type(title))
|
||||||
|
return '<h2 title="'+txt_chap+', '+txt_title+'">'+chap+'</h2>\n<h3 class="sigilNotInTOC">'+title+'</h3>\n'
|
||||||
|
|
||||||
|
def chapter_break(self, match):
|
||||||
|
chap = match.group('section')
|
||||||
|
styles = match.group('styles')
|
||||||
|
self.html_preprocess_sections = self.html_preprocess_sections + 1
|
||||||
|
self.log.debug("marked " + unicode_type(self.html_preprocess_sections) +
|
||||||
|
" section markers based on punctuation. - " + unicode_type(chap))
|
||||||
|
return '<'+styles+' style="page-break-before:always">'+chap
|
||||||
|
|
||||||
|
def analyze_title_matches(self, match):
|
||||||
|
# chap = match.group('chap')
|
||||||
|
title = match.group('title')
|
||||||
|
if not title:
|
||||||
|
self.chapters_no_title = self.chapters_no_title + 1
|
||||||
|
else:
|
||||||
|
self.chapters_with_title = self.chapters_with_title + 1
|
||||||
|
|
||||||
|
def insert_indent(self, match):
|
||||||
|
pstyle = match.group('formatting')
|
||||||
|
tag = match.group('tagtype')
|
||||||
|
span = match.group('span')
|
||||||
|
self.found_indents = self.found_indents + 1
|
||||||
|
if pstyle:
|
||||||
|
if pstyle.lower().find('style') != -1:
|
||||||
|
pstyle = re.sub(r'"$', '; text-indent:3%"', pstyle)
|
||||||
|
else:
|
||||||
|
pstyle = pstyle+' style="text-indent:3%"'
|
||||||
|
if not span:
|
||||||
|
return '<'+tag+' '+pstyle+'>'
|
||||||
|
else:
|
||||||
|
return '<'+tag+' '+pstyle+'>'+span
|
||||||
|
else:
|
||||||
|
if not span:
|
||||||
|
return '<'+tag+' style="text-indent:3%">'
|
||||||
|
else:
|
||||||
|
return '<'+tag+' style="text-indent:3%">'+span
|
||||||
|
|
||||||
|
def no_markup(self, raw, percent):
|
||||||
|
'''
|
||||||
|
Detects total marked up line endings in the file. raw is the text to
|
||||||
|
inspect. Percent is the minimum percent of line endings which should
|
||||||
|
be marked up to return true.
|
||||||
|
'''
|
||||||
|
htm_end_ere = re.compile('</(p|div)>', re.DOTALL)
|
||||||
|
line_end_ere = re.compile('(\n|\r|\r\n)', re.DOTALL)
|
||||||
|
htm_end = htm_end_ere.findall(raw)
|
||||||
|
line_end = line_end_ere.findall(raw)
|
||||||
|
tot_htm_ends = len(htm_end)
|
||||||
|
tot_ln_fds = len(line_end)
|
||||||
|
# self.log.debug("There are " + unicode_type(tot_ln_fds) + " total Line feeds, and " +
|
||||||
|
# unicode_type(tot_htm_ends) + " marked up endings")
|
||||||
|
|
||||||
|
if percent > 1:
|
||||||
|
percent = 1
|
||||||
|
if percent < 0:
|
||||||
|
percent = 0
|
||||||
|
|
||||||
|
min_lns = tot_ln_fds * percent
|
||||||
|
# self.log.debug("There must be fewer than " + unicode_type(min_lns) + " unmarked lines to add markup")
|
||||||
|
return min_lns > tot_htm_ends
|
||||||
|
|
||||||
|
def dump(self, raw, where):
|
||||||
|
import os
|
||||||
|
dp = getattr(self.extra_opts, 'debug_pipeline', None)
|
||||||
|
if dp and os.path.exists(dp):
|
||||||
|
odir = os.path.join(dp, 'preprocess')
|
||||||
|
if not os.path.exists(odir):
|
||||||
|
os.makedirs(odir)
|
||||||
|
if os.path.exists(odir):
|
||||||
|
odir = os.path.join(odir, where)
|
||||||
|
if not os.path.exists(odir):
|
||||||
|
os.makedirs(odir)
|
||||||
|
name, i = None, 0
|
||||||
|
while not name or os.path.exists(os.path.join(odir, name)):
|
||||||
|
i += 1
|
||||||
|
name = '%04d.html'%i
|
||||||
|
with open(os.path.join(odir, name), 'wb') as f:
|
||||||
|
f.write(raw.encode('utf-8'))
|
||||||
|
|
||||||
|
def get_word_count(self, html):
|
||||||
|
word_count_text = re.sub(r'(?s)<head[^>]*>.*?</head>', '', html)
|
||||||
|
word_count_text = re.sub(r'<[^>]*>', '', word_count_text)
|
||||||
|
wordcount = get_wordcount_obj(word_count_text)
|
||||||
|
return wordcount.words
|
||||||
|
|
||||||
|
def markup_italicis(self, html):
|
||||||
|
# self.log.debug("\n\n\nitalicize debugging \n\n\n")
|
||||||
|
ITALICIZE_WORDS = [
|
||||||
|
'Etc.', 'etc.', 'viz.', 'ie.', 'i.e.', 'Ie.', 'I.e.', 'eg.',
|
||||||
|
'e.g.', 'Eg.', 'E.g.', 'et al.', 'et cetera', 'n.b.', 'N.b.',
|
||||||
|
'nota bene', 'Nota bene', 'Ste.', 'Mme.', 'Mdme.',
|
||||||
|
'Mlle.', 'Mons.', 'PS.', 'PPS.',
|
||||||
|
]
|
||||||
|
|
||||||
|
ITALICIZE_STYLE_PATS = [
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])_\*/(?P<words>[^\*_]+)/\*_'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])~~(?P<words>[^~]+)~~'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])_/(?P<words>[^/_]+)/_'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])_\*(?P<words>[^\*_]+)\*_'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])\*/(?P<words>[^/\*]+)/\*'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])/:(?P<words>[^:/]+):/'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])\|:(?P<words>[^:\|]+):\|'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])\*(?P<words>[^\*]+)\*'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])~(?P<words>[^~]+)~'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])/(?P<words>[^/\*><]+)/'),
|
||||||
|
unicode_type(r'(?msu)(?<=[\s>"“\'‘])_(?P<words>[^_]+)_'),
|
||||||
|
]
|
||||||
|
|
||||||
|
for word in ITALICIZE_WORDS:
|
||||||
|
html = re.sub(r'(?<=\s|>)' + re.escape(word) + r'(?=\s|<)', '<i>%s</i>' % word, html)
|
||||||
|
|
||||||
|
search_text = re.sub(r'(?s)<head[^>]*>.*?</head>', '', html)
|
||||||
|
search_text = re.sub(r'<[^>]*>', '', search_text)
|
||||||
|
for pat in ITALICIZE_STYLE_PATS:
|
||||||
|
for match in re.finditer(pat, search_text):
|
||||||
|
ital_string = unicode_type(match.group('words'))
|
||||||
|
# self.log.debug("italicising "+unicode_type(match.group(0))+" with <i>"+ital_string+"</i>")
|
||||||
|
try:
|
||||||
|
html = re.sub(re.escape(unicode_type(match.group(0))), '<i>%s</i>' % ital_string, html)
|
||||||
|
except OverflowError:
|
||||||
|
# match.group(0) was too large to be compiled into a regex
|
||||||
|
continue
|
||||||
|
except re.error:
|
||||||
|
# the match was not a valid regular expression
|
||||||
|
continue
|
||||||
|
|
||||||
|
return html
|
||||||
|
|
||||||
|
def markup_chapters(self, html, wordcount, blanks_between_paragraphs):
|
||||||
|
'''
|
||||||
|
Searches for common chapter headings throughout the document
|
||||||
|
attempts multiple patterns based on likelihood of a match
|
||||||
|
with minimum false positives. Exits after finding a successful pattern
|
||||||
|
'''
|
||||||
|
# Typical chapters are between 2000 and 7000 words, use the larger number to decide the
|
||||||
|
# minimum of chapters to search for. A max limit is calculated to prevent things like OCR
|
||||||
|
# or pdf page numbers from being treated as TOC markers
|
||||||
|
max_chapters = 150
|
||||||
|
typical_chapters = 7000.
|
||||||
|
if wordcount > 7000:
|
||||||
|
if wordcount > 200000:
|
||||||
|
typical_chapters = 15000.
|
||||||
|
self.min_chapters = int(ceil(wordcount / typical_chapters))
|
||||||
|
self.log.debug("minimum chapters required are: "+unicode_type(self.min_chapters))
|
||||||
|
heading = re.compile('<h[1-3][^>]*>', re.IGNORECASE)
|
||||||
|
self.html_preprocess_sections = len(heading.findall(html))
|
||||||
|
self.log.debug("found " + unicode_type(self.html_preprocess_sections) + " pre-existing headings")
|
||||||
|
|
||||||
|
# Build the Regular Expressions in pieces
|
||||||
|
init_lookahead = "(?=<(p|div))"
|
||||||
|
chapter_line_open = self.line_open
|
||||||
|
title_line_open = (r"<(?P<outer2>p|div)[^>]*>\s*(<(?P<inner4>font|span|[ibu])[^>]*>)?"
|
||||||
|
r"\s*(<(?P<inner5>font|span|[ibu])[^>]*>)?\s*(<(?P<inner6>font|span|[ibu])[^>]*>)?\s*")
|
||||||
|
chapter_header_open = r"(?P<chap>"
|
||||||
|
title_header_open = r"(?P<title>"
|
||||||
|
chapter_header_close = ")\\s*"
|
||||||
|
title_header_close = ")"
|
||||||
|
chapter_line_close = self.line_close
|
||||||
|
title_line_close = "(</(?P=inner6)>)?\\s*(</(?P=inner5)>)?\\s*(</(?P=inner4)>)?\\s*</(?P=outer2)>"
|
||||||
|
|
||||||
|
is_pdftohtml = self.is_pdftohtml(html)
|
||||||
|
if is_pdftohtml:
|
||||||
|
title_line_open = "<(?P<outer2>p)[^>]*>\\s*"
|
||||||
|
title_line_close = "\\s*</(?P=outer2)>"
|
||||||
|
|
||||||
|
if blanks_between_paragraphs:
|
||||||
|
blank_lines = "(\\s*<p[^>]*>\\s*</p>){0,2}\\s*"
|
||||||
|
else:
|
||||||
|
blank_lines = ""
|
||||||
|
opt_title_open = "("
|
||||||
|
opt_title_close = ")?"
|
||||||
|
n_lookahead_open = "(?!\\s*"
|
||||||
|
n_lookahead_close = ")\\s*"
|
||||||
|
|
||||||
|
default_title = r"(<[ibu][^>]*>)?\s{0,3}(?!Chapter)([\w\:\'’\"-]+\s{0,3}){1,5}?(</[ibu][^>]*>)?(?=<)"
|
||||||
|
simple_title = r"(<[ibu][^>]*>)?\s{0,3}(?!(Chapter|\s+<)).{0,65}?(</[ibu][^>]*>)?(?=<)"
|
||||||
|
|
||||||
|
analysis_result = []
|
||||||
|
|
||||||
|
chapter_types = [
|
||||||
|
[(
|
||||||
|
r"[^'\"]?(Introduction|Synopsis|Acknowledgements|Epilogue|CHAPTER|Kapitel|Volume\b|Prologue|Book\b|Part\b|Dedication|Preface)"
|
||||||
|
r"\s*([\d\w-]+\:?\'?\s*){0,5}"), True, True, True, False, "Searching for common section headings", 'common'],
|
||||||
|
# Highest frequency headings which include titles
|
||||||
|
[r"[^'\"]?(CHAPTER|Kapitel)\s*([\dA-Z\-\'\"\?!#,]+\s*){0,7}\s*", True, True, True, False, "Searching for most common chapter headings", 'chapter'],
|
||||||
|
[r"<b[^>]*>\s*(<span[^>]*>)?\s*(?!([*#•=]+\s*)+)(\s*(?=[\d.\w#\-*\s]+<)([\d.\w#-*]+\s*){1,5}\s*)(?!\.)(</span>)?\s*</b>",
|
||||||
|
True, True, True, False, "Searching for emphasized lines", 'emphasized'], # Emphasized lines
|
||||||
|
[r"[^'\"]?(\d+(\.|:))\s*([\w\-\'\"#,]+\s*){0,7}\s*", True, True, True, False,
|
||||||
|
"Searching for numeric chapter headings", 'numeric'], # Numeric Chapters
|
||||||
|
[r"([A-Z]\s+){3,}\s*([\d\w-]+\s*){0,3}\s*", True, True, True, False, "Searching for letter spaced headings", 'letter_spaced'], # Spaced Lettering
|
||||||
|
[r"[^'\"]?(\d+\.?\s+([\d\w-]+\:?\'?-?\s?){0,5})\s*", True, True, True, False,
|
||||||
|
"Searching for numeric chapters with titles", 'numeric_title'], # Numeric Titles
|
||||||
|
[r"[^'\"]?(\d+)\s*([\dA-Z\-\'\"\?!#,]+\s*){0,7}\s*", True, True, True, False,
|
||||||
|
"Searching for simple numeric headings", 'plain_number'], # Numeric Chapters, no dot or colon
|
||||||
|
[r"\s*[^'\"]?([A-Z#]+(\s|-){0,3}){1,5}\s*", False, True, False, False,
|
||||||
|
"Searching for chapters with Uppercase Characters", 'uppercase'] # Uppercase Chapters
|
||||||
|
]
|
||||||
|
|
||||||
|
def recurse_patterns(html, analyze):
|
||||||
|
# Start with most typical chapter headings, get more aggressive until one works
|
||||||
|
for [chapter_type, n_lookahead_req, strict_title, ignorecase, title_req, log_message, type_name] in chapter_types:
|
||||||
|
n_lookahead = ''
|
||||||
|
hits = 0
|
||||||
|
self.chapters_no_title = 0
|
||||||
|
self.chapters_with_title = 0
|
||||||
|
|
||||||
|
if n_lookahead_req:
|
||||||
|
lp_n_lookahead_open = n_lookahead_open
|
||||||
|
lp_n_lookahead_close = n_lookahead_close
|
||||||
|
else:
|
||||||
|
lp_n_lookahead_open = ''
|
||||||
|
lp_n_lookahead_close = ''
|
||||||
|
|
||||||
|
if strict_title:
|
||||||
|
lp_title = default_title
|
||||||
|
else:
|
||||||
|
lp_title = simple_title
|
||||||
|
|
||||||
|
if ignorecase:
|
||||||
|
arg_ignorecase = r'(?i)'
|
||||||
|
else:
|
||||||
|
arg_ignorecase = ''
|
||||||
|
|
||||||
|
if title_req:
|
||||||
|
lp_opt_title_open = ''
|
||||||
|
lp_opt_title_close = ''
|
||||||
|
else:
|
||||||
|
lp_opt_title_open = opt_title_open
|
||||||
|
lp_opt_title_close = opt_title_close
|
||||||
|
|
||||||
|
if self.html_preprocess_sections >= self.min_chapters:
|
||||||
|
break
|
||||||
|
full_chapter_line = chapter_line_open+chapter_header_open+chapter_type+chapter_header_close+chapter_line_close
|
||||||
|
if n_lookahead_req:
|
||||||
|
n_lookahead = re.sub("(ou|in|cha)", "lookahead_", full_chapter_line)
|
||||||
|
if not analyze:
|
||||||
|
self.log.debug("Marked " + unicode_type(self.html_preprocess_sections) + " headings, " + log_message)
|
||||||
|
|
||||||
|
chapter_marker = arg_ignorecase+init_lookahead+full_chapter_line+blank_lines+lp_n_lookahead_open+n_lookahead+lp_n_lookahead_close+ \
|
||||||
|
lp_opt_title_open+title_line_open+title_header_open+lp_title+title_header_close+title_line_close+lp_opt_title_close
|
||||||
|
chapdetect = re.compile(r'%s' % chapter_marker)
|
||||||
|
|
||||||
|
if analyze:
|
||||||
|
hits = len(chapdetect.findall(html))
|
||||||
|
if hits:
|
||||||
|
chapdetect.sub(self.analyze_title_matches, html)
|
||||||
|
if float(self.chapters_with_title) / float(hits) > .5:
|
||||||
|
title_req = True
|
||||||
|
strict_title = False
|
||||||
|
self.log.debug(
|
||||||
|
unicode_type(type_name)+" had "+unicode_type(hits)+
|
||||||
|
" hits - "+unicode_type(self.chapters_no_title)+" chapters with no title, "+
|
||||||
|
unicode_type(self.chapters_with_title)+" chapters with titles, "+
|
||||||
|
unicode_type(float(self.chapters_with_title) / float(hits))+" percent. ")
|
||||||
|
if type_name == 'common':
|
||||||
|
analysis_result.append([chapter_type, n_lookahead_req, strict_title, ignorecase, title_req, log_message, type_name])
|
||||||
|
elif self.min_chapters <= hits < max_chapters or self.min_chapters < 3 > hits:
|
||||||
|
analysis_result.append([chapter_type, n_lookahead_req, strict_title, ignorecase, title_req, log_message, type_name])
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
html = chapdetect.sub(self.chapter_head, html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
recurse_patterns(html, True)
|
||||||
|
chapter_types = analysis_result
|
||||||
|
html = recurse_patterns(html, False)
|
||||||
|
|
||||||
|
words_per_chptr = wordcount
|
||||||
|
if words_per_chptr > 0 and self.html_preprocess_sections > 0:
|
||||||
|
words_per_chptr = wordcount // self.html_preprocess_sections
|
||||||
|
self.log.debug("Total wordcount is: "+ unicode_type(wordcount)+", Average words per section is: "+
|
||||||
|
unicode_type(words_per_chptr)+", Marked up "+unicode_type(self.html_preprocess_sections)+" chapters")
|
||||||
|
return html
|
||||||
|
|
||||||
|
def punctuation_unwrap(self, length, content, format):
|
||||||
|
'''
|
||||||
|
Unwraps lines based on line length and punctuation
|
||||||
|
supports a range of html markup and text files
|
||||||
|
|
||||||
|
the lookahead regex below is meant look for any non-full stop characters - punctuation
|
||||||
|
characters which can be used as a full stop should *not* be added below - e.g. ?!“”. etc
|
||||||
|
the reason for this is to prevent false positive wrapping. False positives are more
|
||||||
|
difficult to detect than false negatives during a manual review of the doc
|
||||||
|
|
||||||
|
This function intentionally leaves hyphenated content alone as that is handled by the
|
||||||
|
dehyphenate routine in a separate step
|
||||||
|
'''
|
||||||
|
def style_unwrap(match):
|
||||||
|
style_close = match.group('style_close')
|
||||||
|
style_open = match.group('style_open')
|
||||||
|
if style_open and style_close:
|
||||||
|
return style_close+' '+style_open
|
||||||
|
elif style_open and not style_close:
|
||||||
|
return ' '+style_open
|
||||||
|
elif not style_open and style_close:
|
||||||
|
return style_close+' '
|
||||||
|
else:
|
||||||
|
return ' '
|
||||||
|
|
||||||
|
# define the pieces of the regex
|
||||||
|
# (?<!\&\w{4});) is a semicolon not part of an entity
|
||||||
|
lookahead = "(?<=.{"+unicode_type(length)+r"}([a-zა-ჰäëïöüàèìòùáćéíĺóŕńśúýâêîôûçąężıãõñæøþðßěľščťžňďřů,:)\\IAß]|(?<!\&\w{4});))"
|
||||||
|
em_en_lookahead = "(?<=.{"+unicode_type(length)+"}[\u2013\u2014])"
|
||||||
|
soft_hyphen = "\xad"
|
||||||
|
line_ending = "\\s*(?P<style_close></(span|[iub])>)?\\s*(</(p|div)>)?"
|
||||||
|
blanklines = "\\s*(?P<up2threeblanks><(p|span|div)[^>]*>\\s*(<(p|span|div)[^>]*>\\s*</(span|p|div)>\\s*)</(span|p|div)>\\s*){0,3}\\s*"
|
||||||
|
line_opening = "<(p|div)[^>]*>\\s*(?P<style_open><(span|[iub])[^>]*>)?\\s*"
|
||||||
|
txt_line_wrap = "((\u0020|\u0009)*\n){1,4}"
|
||||||
|
|
||||||
|
if format == 'txt':
|
||||||
|
unwrap_regex = lookahead+txt_line_wrap
|
||||||
|
em_en_unwrap_regex = em_en_lookahead+txt_line_wrap
|
||||||
|
shy_unwrap_regex = soft_hyphen+txt_line_wrap
|
||||||
|
else:
|
||||||
|
unwrap_regex = lookahead+line_ending+blanklines+line_opening
|
||||||
|
em_en_unwrap_regex = em_en_lookahead+line_ending+blanklines+line_opening
|
||||||
|
shy_unwrap_regex = soft_hyphen+line_ending+blanklines+line_opening
|
||||||
|
|
||||||
|
unwrap = re.compile("%s" % unwrap_regex, re.UNICODE)
|
||||||
|
em_en_unwrap = re.compile("%s" % em_en_unwrap_regex, re.UNICODE)
|
||||||
|
shy_unwrap = re.compile("%s" % shy_unwrap_regex, re.UNICODE)
|
||||||
|
|
||||||
|
if format == 'txt':
|
||||||
|
content = unwrap.sub(' ', content)
|
||||||
|
content = em_en_unwrap.sub('', content)
|
||||||
|
content = shy_unwrap.sub('', content)
|
||||||
|
else:
|
||||||
|
content = unwrap.sub(style_unwrap, content)
|
||||||
|
content = em_en_unwrap.sub(style_unwrap, content)
|
||||||
|
content = shy_unwrap.sub(style_unwrap, content)
|
||||||
|
|
||||||
|
return content
|
||||||
|
|
||||||
|
def txt_process(self, match):
|
||||||
|
from calibre.ebooks.txt.processor import convert_basic, separate_paragraphs_single_line
|
||||||
|
content = match.group('text')
|
||||||
|
content = separate_paragraphs_single_line(content)
|
||||||
|
content = convert_basic(content, epub_split_size_kb=0)
|
||||||
|
return content
|
||||||
|
|
||||||
|
def markup_pre(self, html):
|
||||||
|
pre = re.compile(r'<pre>', re.IGNORECASE)
|
||||||
|
if len(pre.findall(html)) >= 1:
|
||||||
|
self.log.debug("Running Text Processing")
|
||||||
|
outerhtml = re.compile(r'.*?(?<=<pre>)(?P<text>.*?)</pre>', re.IGNORECASE|re.DOTALL)
|
||||||
|
html = outerhtml.sub(self.txt_process, html)
|
||||||
|
from calibre.ebooks.conversion.preprocess import convert_entities
|
||||||
|
html = re.sub(r'&(\S+?);', convert_entities, html)
|
||||||
|
else:
|
||||||
|
# Add markup naively
|
||||||
|
# TODO - find out if there are cases where there are more than one <pre> tag or
|
||||||
|
# other types of unmarked html and handle them in some better fashion
|
||||||
|
add_markup = re.compile('(?<!>)(\n)')
|
||||||
|
html = add_markup.sub('</p>\n<p>', html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
def arrange_htm_line_endings(self, html):
|
||||||
|
html = re.sub(r"\s*</(?P<tag>p|div)>", "</"+"\\g<tag>"+">\n", html)
|
||||||
|
html = re.sub(r"\s*<(?P<tag>p|div)(?P<style>[^>]*)>\s*", "\n<"+"\\g<tag>"+"\\g<style>"+">", html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
def fix_nbsp_indents(self, html):
|
||||||
|
txtindent = re.compile(unicode_type(r'<(?P<tagtype>p|div)(?P<formatting>[^>]*)>\s*(?P<span>(<span[^>]*>\s*)+)?\s*(\u00a0){2,}'), re.IGNORECASE)
|
||||||
|
html = txtindent.sub(self.insert_indent, html)
|
||||||
|
if self.found_indents > 1:
|
||||||
|
self.log.debug("replaced "+unicode_type(self.found_indents)+ " nbsp indents with inline styles")
|
||||||
|
return html
|
||||||
|
|
||||||
|
def cleanup_markup(self, html):
|
||||||
|
# remove remaining non-breaking spaces
|
||||||
|
html = re.sub(unicode_type(r'\u00a0'), ' ', html)
|
||||||
|
# Get rid of various common microsoft specific tags which can cause issues later
|
||||||
|
# Get rid of empty <o:p> tags to simplify other processing
|
||||||
|
html = re.sub(unicode_type(r'\s*<o:p>\s*</o:p>'), ' ', html)
|
||||||
|
# Delete microsoft 'smart' tags
|
||||||
|
html = re.sub('(?i)</?st1:\\w+>', '', html)
|
||||||
|
# Re-open self closing paragraph tags
|
||||||
|
html = re.sub('<p[^>/]*/>', '<p> </p>', html)
|
||||||
|
# Get rid of empty span, bold, font, em, & italics tags
|
||||||
|
fmt_tags = 'font|[ibu]|em|strong'
|
||||||
|
open_fmt_pat, close_fmt_pat = r'<(?:{})(?:\s[^>]*)?>'.format(fmt_tags), '</(?:{})>'.format(fmt_tags)
|
||||||
|
for i in range(2):
|
||||||
|
html = re.sub(r"\s*<span[^>]*>\s*(<span[^>]*>\s*</span>){0,2}\s*</span>\s*", " ", html)
|
||||||
|
html = re.sub(
|
||||||
|
r"\s*{open}\s*({open}\s*{close}\s*){{0,2}}\s*{close}".format(open=open_fmt_pat, close=close_fmt_pat) , " ", html)
|
||||||
|
# delete surrounding divs from empty paragraphs
|
||||||
|
html = re.sub('<div[^>]*>\\s*<p[^>]*>\\s*</p>\\s*</div>', '<p> </p>', html)
|
||||||
|
# Empty heading tags
|
||||||
|
html = re.sub(r'(?i)<h\d+>\s*</h\d+>', '', html)
|
||||||
|
self.deleted_nbsps = True
|
||||||
|
return html
|
||||||
|
|
||||||
|
def analyze_line_endings(self, html):
|
||||||
|
'''
|
||||||
|
determines the type of html line ending used most commonly in a document
|
||||||
|
use before calling docanalysis functions
|
||||||
|
'''
|
||||||
|
paras_reg = re.compile('<p[^>]*>', re.IGNORECASE)
|
||||||
|
spans_reg = re.compile('<span[^>]*>', re.IGNORECASE)
|
||||||
|
paras = len(paras_reg.findall(html))
|
||||||
|
spans = len(spans_reg.findall(html))
|
||||||
|
if spans > 1:
|
||||||
|
if float(paras) / float(spans) < 0.75:
|
||||||
|
return 'spanned_html'
|
||||||
|
else:
|
||||||
|
return 'html'
|
||||||
|
else:
|
||||||
|
return 'html'
|
||||||
|
|
||||||
|
def analyze_blanks(self, html):
|
||||||
|
blanklines = self.blankreg.findall(html)
|
||||||
|
lines = self.linereg.findall(html)
|
||||||
|
if len(lines) > 1:
|
||||||
|
self.log.debug("There are " + unicode_type(len(blanklines)) + " blank lines. " +
|
||||||
|
unicode_type(float(len(blanklines)) / float(len(lines))) + " percent blank")
|
||||||
|
|
||||||
|
if float(len(blanklines)) / float(len(lines)) > 0.40:
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
return False
|
||||||
|
|
||||||
|
def cleanup_required(self):
|
||||||
|
for option in ['unwrap_lines', 'markup_chapter_headings', 'format_scene_breaks', 'delete_blank_paragraphs']:
|
||||||
|
if getattr(self.extra_opts, option, False):
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
def merge_blanks(self, html, blanks_count=None):
|
||||||
|
base_em = .5 # Baseline is 1.5em per blank line, 1st line is .5 em css and 1em for the nbsp
|
||||||
|
em_per_line = 1.5 # Add another 1.5 em for each additional blank
|
||||||
|
|
||||||
|
def merge_matches(match):
|
||||||
|
to_merge = match.group(0)
|
||||||
|
lines = float(len(self.single_blank.findall(to_merge))) - 1.
|
||||||
|
em = base_em + (em_per_line * lines)
|
||||||
|
if to_merge.find('whitespace'):
|
||||||
|
newline = self.any_multi_blank.sub('\n<p class="whitespace'+unicode_type(int(em * 10))+
|
||||||
|
'" style="text-align:center; margin-top:'+unicode_type(em)+'em"> </p>', match.group(0))
|
||||||
|
else:
|
||||||
|
newline = self.any_multi_blank.sub('\n<p class="softbreak'+unicode_type(int(em * 10))+
|
||||||
|
'" style="text-align:center; margin-top:'+unicode_type(em)+'em"> </p>', match.group(0))
|
||||||
|
return newline
|
||||||
|
|
||||||
|
html = self.any_multi_blank.sub(merge_matches, html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
def detect_whitespace(self, html):
|
||||||
|
blanks_around_headings = re.compile(
|
||||||
|
r'(?P<initparas>(<(p|div)[^>]*>\s*</(p|div)>\s*){1,}\s*)?'
|
||||||
|
r'(?P<content><h(?P<hnum>\d+)[^>]*>.*?</h(?P=hnum)>)(?P<endparas>\s*(<(p|div)[^>]*>\s*</(p|div)>\s*){1,})?', re.IGNORECASE|re.DOTALL)
|
||||||
|
blanks_around_scene_breaks = re.compile(
|
||||||
|
r'(?P<initparas>(<(p|div)[^>]*>\s*</(p|div)>\s*){1,}\s*)?'
|
||||||
|
r'(?P<content><p class="scenebreak"[^>]*>.*?</p>)(?P<endparas>\s*(<(p|div)[^>]*>\s*</(p|div)>\s*){1,})?', re.IGNORECASE|re.DOTALL)
|
||||||
|
blanks_n_nopunct = re.compile(
|
||||||
|
r'(?P<initparas>(<p[^>]*>\s*</p>\s*){1,}\s*)?<p[^>]*>\s*(<(span|[ibu]|em|strong|font)[^>]*>\s*)*'
|
||||||
|
r'.{1,100}?[^\W](</(span|[ibu]|em|strong|font)>\s*)*</p>(?P<endparas>\s*(<p[^>]*>\s*</p>\s*){1,})?', re.IGNORECASE|re.DOTALL)
|
||||||
|
|
||||||
|
def merge_header_whitespace(match):
|
||||||
|
initblanks = match.group('initparas')
|
||||||
|
endblanks = match.group('endparas')
|
||||||
|
content = match.group('content')
|
||||||
|
top_margin = ''
|
||||||
|
bottom_margin = ''
|
||||||
|
if initblanks is not None:
|
||||||
|
top_margin = 'margin-top:'+unicode_type(len(self.single_blank.findall(initblanks)))+'em;'
|
||||||
|
if endblanks is not None:
|
||||||
|
bottom_margin = 'margin-bottom:'+unicode_type(len(self.single_blank.findall(endblanks)))+'em;'
|
||||||
|
|
||||||
|
if initblanks is None and endblanks is None:
|
||||||
|
return content
|
||||||
|
elif content.find('scenebreak') != -1:
|
||||||
|
return content
|
||||||
|
else:
|
||||||
|
content = re.sub('(?i)<h(?P<hnum>\\d+)[^>]*>', '\n\n<h'+'\\g<hnum>'+' style="'+top_margin+bottom_margin+'">', content)
|
||||||
|
return content
|
||||||
|
|
||||||
|
html = blanks_around_headings.sub(merge_header_whitespace, html)
|
||||||
|
html = blanks_around_scene_breaks.sub(merge_header_whitespace, html)
|
||||||
|
|
||||||
|
def markup_whitespaces(match):
|
||||||
|
blanks = match.group(0)
|
||||||
|
blanks = self.blankreg.sub('\n<p class="whitespace" style="text-align:center; margin-top:0em; margin-bottom:0em"> </p>', blanks)
|
||||||
|
return blanks
|
||||||
|
|
||||||
|
html = blanks_n_nopunct.sub(markup_whitespaces, html)
|
||||||
|
if self.html_preprocess_sections > self.min_chapters:
|
||||||
|
html = re.sub('(?si)^.*?(?=<h\\d)', markup_whitespaces, html)
|
||||||
|
|
||||||
|
return html
|
||||||
|
|
||||||
|
def detect_soft_breaks(self, html):
|
||||||
|
line = '(?P<initline>'+self.line_open+'\\s*(?P<init_content>.*?)'+self.line_close+')'
|
||||||
|
line_two = '(?P<line_two>'+re.sub('(ou|in|cha)', 'linetwo_', self.line_open)+ \
|
||||||
|
'\\s*(?P<line_two_content>.*?)'+re.sub('(ou|in|cha)', 'linetwo_', self.line_close)+')'
|
||||||
|
div_break_candidate_pattern = line+'\\s*<div[^>]*>\\s*</div>\\s*'+line_two
|
||||||
|
div_break_candidate = re.compile(r'%s' % div_break_candidate_pattern, re.IGNORECASE|re.UNICODE)
|
||||||
|
|
||||||
|
def convert_div_softbreaks(match):
|
||||||
|
init_is_paragraph = self.check_paragraph(match.group('init_content'))
|
||||||
|
line_two_is_paragraph = self.check_paragraph(match.group('line_two_content'))
|
||||||
|
if init_is_paragraph and line_two_is_paragraph:
|
||||||
|
return (match.group('initline')+
|
||||||
|
'\n<p class="softbreak" style="margin-top:.5em; page-break-before:avoid; text-align:center"> </p>\n'+
|
||||||
|
match.group('line_two'))
|
||||||
|
else:
|
||||||
|
return match.group(0)
|
||||||
|
|
||||||
|
html = div_break_candidate.sub(convert_div_softbreaks, html)
|
||||||
|
|
||||||
|
if not self.blanks_deleted and self.blanks_between_paragraphs:
|
||||||
|
html = self.multi_blank.sub('\n<p class="softbreak" style="margin-top:1em; page-break-before:avoid; text-align:center"> </p>', html)
|
||||||
|
else:
|
||||||
|
html = self.blankreg.sub('\n<p class="softbreak" style="margin-top:.5em; page-break-before:avoid; text-align:center"> </p>', html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
def detect_scene_breaks(self, html):
|
||||||
|
scene_break_regex = self.line_open+'(?!('+self.common_in_text_beginnings+'|.*?'+self.common_in_text_endings+ \
|
||||||
|
'<))(?P<break>((?P<break_char>((?!\\s)\\W))\\s*(?P=break_char)?)+)\\s*'+self.line_close
|
||||||
|
scene_breaks = re.compile(r'%s' % scene_break_regex, re.IGNORECASE|re.UNICODE)
|
||||||
|
html = scene_breaks.sub(self.scene_break_open+'\\g<break>'+'</p>', html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
def markup_user_break(self, replacement_break):
|
||||||
|
'''
|
||||||
|
Takes string a user supplies and wraps it in markup that will be centered with
|
||||||
|
appropriate margins. <hr> and <img> tags are allowed. If the user specifies
|
||||||
|
a style with width attributes in the <hr> tag then the appropriate margins are
|
||||||
|
applied to wrapping divs. This is because many ebook devices don't support margin:auto
|
||||||
|
All other html is converted to text.
|
||||||
|
'''
|
||||||
|
hr_open = '<div id="scenebreak" style="margin-left: 45%; margin-right: 45%; margin-top:1.5em; margin-bottom:1.5em; page-break-before:avoid">'
|
||||||
|
if re.findall('(<|>)', replacement_break):
|
||||||
|
if re.match('^<hr', replacement_break):
|
||||||
|
if replacement_break.find('width') != -1:
|
||||||
|
try:
|
||||||
|
width = int(re.sub('.*?width(:|=)(?P<wnum>\\d+).*', '\\g<wnum>', replacement_break))
|
||||||
|
except:
|
||||||
|
scene_break = hr_open+'<hr style="height: 3px; background:#505050" /></div>'
|
||||||
|
self.log.warn('Invalid replacement scene break'
|
||||||
|
' expression, using default')
|
||||||
|
else:
|
||||||
|
replacement_break = re.sub('(?i)(width=\\d+\\%?|width:\\s*\\d+(\\%|px|pt|em)?;?)', '', replacement_break)
|
||||||
|
divpercent = (100 - width) // 2
|
||||||
|
hr_open = re.sub('45', unicode_type(divpercent), hr_open)
|
||||||
|
scene_break = hr_open+replacement_break+'</div>'
|
||||||
|
else:
|
||||||
|
scene_break = hr_open+'<hr style="height: 3px; background:#505050" /></div>'
|
||||||
|
elif re.match('^<img', replacement_break):
|
||||||
|
scene_break = self.scene_break_open+replacement_break+'</p>'
|
||||||
|
else:
|
||||||
|
from calibre.utils.html2text import html2text
|
||||||
|
replacement_break = html2text(replacement_break)
|
||||||
|
replacement_break = re.sub('\\s', ' ', replacement_break)
|
||||||
|
scene_break = self.scene_break_open+replacement_break+'</p>'
|
||||||
|
else:
|
||||||
|
replacement_break = re.sub('\\s', ' ', replacement_break)
|
||||||
|
scene_break = self.scene_break_open+replacement_break+'</p>'
|
||||||
|
|
||||||
|
return scene_break
|
||||||
|
|
||||||
|
def check_paragraph(self, content):
|
||||||
|
content = re.sub('\\s*</?span[^>]*>\\s*', '', content)
|
||||||
|
if re.match('.*[\"\'.!?:]$', content):
|
||||||
|
# print "detected this as a paragraph"
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
return False
|
||||||
|
|
||||||
|
def abbyy_processor(self, html):
|
||||||
|
abbyy_line = re.compile('((?P<linestart><p\\sstyle="(?P<styles>[^\"]*?);?">)(?P<content>.*?)(?P<lineend></p>)|(?P<image><img[^>]*>))', re.IGNORECASE)
|
||||||
|
empty_paragraph = '\n<p> </p>\n'
|
||||||
|
self.in_blockquote = False
|
||||||
|
self.previous_was_paragraph = False
|
||||||
|
html = re.sub('</?a[^>]*>', '', html)
|
||||||
|
|
||||||
|
def convert_styles(match):
|
||||||
|
# print "raw styles are: "+match.group('styles')
|
||||||
|
content = match.group('content')
|
||||||
|
# print "raw content is: "+match.group('content')
|
||||||
|
image = match.group('image')
|
||||||
|
|
||||||
|
is_paragraph = False
|
||||||
|
text_align = ''
|
||||||
|
text_indent = ''
|
||||||
|
paragraph_before = ''
|
||||||
|
paragraph_after = ''
|
||||||
|
blockquote_open = '\n<blockquote>\n'
|
||||||
|
blockquote_close = '</blockquote>\n'
|
||||||
|
indented_text = 'text-indent:3%;'
|
||||||
|
blockquote_open_loop = ''
|
||||||
|
blockquote_close_loop = ''
|
||||||
|
debugabby = False
|
||||||
|
|
||||||
|
if image:
|
||||||
|
debugabby = True
|
||||||
|
if self.in_blockquote:
|
||||||
|
self.in_blockquote = False
|
||||||
|
blockquote_close_loop = blockquote_close
|
||||||
|
self.previous_was_paragraph = False
|
||||||
|
return blockquote_close_loop+'\n'+image+'\n'
|
||||||
|
else:
|
||||||
|
styles = match.group('styles').split(';')
|
||||||
|
is_paragraph = self.check_paragraph(content)
|
||||||
|
# print "styles for this line are: "+unicode_type(styles)
|
||||||
|
split_styles = []
|
||||||
|
for style in styles:
|
||||||
|
# print "style is: "+unicode_type(style)
|
||||||
|
newstyle = style.split(':')
|
||||||
|
# print "newstyle is: "+unicode_type(newstyle)
|
||||||
|
split_styles.append(newstyle)
|
||||||
|
styles = split_styles
|
||||||
|
for style, setting in styles:
|
||||||
|
if style == 'text-align' and setting != 'left':
|
||||||
|
text_align = style+':'+setting+';'
|
||||||
|
if style == 'text-indent':
|
||||||
|
setting = int(re.sub('\\s*pt\\s*', '', setting))
|
||||||
|
if 9 < setting < 14:
|
||||||
|
text_indent = indented_text
|
||||||
|
else:
|
||||||
|
text_indent = style+':'+unicode_type(setting)+'pt;'
|
||||||
|
if style == 'padding':
|
||||||
|
setting = re.sub('pt', '', setting).split(' ')
|
||||||
|
if int(setting[1]) < 16 and int(setting[3]) < 16:
|
||||||
|
if self.in_blockquote:
|
||||||
|
debugabby = True
|
||||||
|
if is_paragraph:
|
||||||
|
self.in_blockquote = False
|
||||||
|
blockquote_close_loop = blockquote_close
|
||||||
|
if int(setting[3]) > 8 and text_indent == '':
|
||||||
|
text_indent = indented_text
|
||||||
|
if int(setting[0]) > 5:
|
||||||
|
paragraph_before = empty_paragraph
|
||||||
|
if int(setting[2]) > 5:
|
||||||
|
paragraph_after = empty_paragraph
|
||||||
|
elif not self.in_blockquote and self.previous_was_paragraph:
|
||||||
|
debugabby = True
|
||||||
|
self.in_blockquote = True
|
||||||
|
blockquote_open_loop = blockquote_open
|
||||||
|
if debugabby:
|
||||||
|
self.log.debug('\n\n******\n')
|
||||||
|
self.log.debug('padding top is: '+unicode_type(setting[0]))
|
||||||
|
self.log.debug('padding right is:' +unicode_type(setting[1]))
|
||||||
|
self.log.debug('padding bottom is: ' + unicode_type(setting[2]))
|
||||||
|
self.log.debug('padding left is: ' +unicode_type(setting[3]))
|
||||||
|
|
||||||
|
# print "text-align is: "+unicode_type(text_align)
|
||||||
|
# print "\n***\nline is:\n "+unicode_type(match.group(0))+'\n'
|
||||||
|
if debugabby:
|
||||||
|
# print "this line is a paragraph = "+unicode_type(is_paragraph)+", previous line was "+unicode_type(self.previous_was_paragraph)
|
||||||
|
self.log.debug("styles for this line were:", styles)
|
||||||
|
self.log.debug('newline is:')
|
||||||
|
self.log.debug(blockquote_open_loop+blockquote_close_loop+
|
||||||
|
paragraph_before+'<p style="'+text_indent+text_align+
|
||||||
|
'">'+content+'</p>'+paragraph_after+'\n\n\n\n\n')
|
||||||
|
# print "is_paragraph is "+unicode_type(is_paragraph)+", previous_was_paragraph is "+unicode_type(self.previous_was_paragraph)
|
||||||
|
self.previous_was_paragraph = is_paragraph
|
||||||
|
# print "previous_was_paragraph is now set to "+unicode_type(self.previous_was_paragraph)+"\n\n\n"
|
||||||
|
return blockquote_open_loop+blockquote_close_loop+paragraph_before+'<p style="'+text_indent+text_align+'">'+content+'</p>'+paragraph_after
|
||||||
|
|
||||||
|
html = abbyy_line.sub(convert_styles, html)
|
||||||
|
return html
|
||||||
|
|
||||||
|
def __call__(self, html):
|
||||||
|
self.log.debug("********* Heuristic processing HTML *********")
|
||||||
|
# Count the words in the document to estimate how many chapters to look for and whether
|
||||||
|
# other types of processing are attempted
|
||||||
|
try:
|
||||||
|
self.totalwords = self.get_word_count(html)
|
||||||
|
except:
|
||||||
|
self.log.warn("Can't get wordcount")
|
||||||
|
|
||||||
|
if self.totalwords < 50:
|
||||||
|
self.log.warn("flow is too short, not running heuristics")
|
||||||
|
return html
|
||||||
|
|
||||||
|
is_abbyy = self.is_abbyy(html)
|
||||||
|
if is_abbyy:
|
||||||
|
html = self.abbyy_processor(html)
|
||||||
|
|
||||||
|
# Arrange line feeds and </p> tags so the line_length and no_markup functions work correctly
|
||||||
|
html = self.arrange_htm_line_endings(html)
|
||||||
|
# self.dump(html, 'after_arrange_line_endings')
|
||||||
|
if self.cleanup_required():
|
||||||
|
# ##### Check Markup ######
|
||||||
|
#
|
||||||
|
# some lit files don't have any <p> tags or equivalent (generally just plain text between
|
||||||
|
# <pre> tags), check and mark up line endings if required before proceeding
|
||||||
|
# fix indents must run after this step
|
||||||
|
if self.no_markup(html, 0.1):
|
||||||
|
self.log.debug("not enough paragraph markers, adding now")
|
||||||
|
# markup using text processing
|
||||||
|
html = self.markup_pre(html)
|
||||||
|
|
||||||
|
# Replace series of non-breaking spaces with text-indent
|
||||||
|
if getattr(self.extra_opts, 'fix_indents', False):
|
||||||
|
html = self.fix_nbsp_indents(html)
|
||||||
|
|
||||||
|
if self.cleanup_required():
|
||||||
|
# fix indents must run before this step, as it removes non-breaking spaces
|
||||||
|
html = self.cleanup_markup(html)
|
||||||
|
|
||||||
|
is_pdftohtml = self.is_pdftohtml(html)
|
||||||
|
if is_pdftohtml:
|
||||||
|
self.line_open = "<(?P<outer>p)[^>]*>(\\s*<[ibu][^>]*>)?\\s*"
|
||||||
|
self.line_close = "\\s*(</[ibu][^>]*>\\s*)?</(?P=outer)>"
|
||||||
|
|
||||||
|
# ADE doesn't render <br />, change to empty paragraphs
|
||||||
|
# html = re.sub('<br[^>]*>', u'<p>\u00a0</p>', html)
|
||||||
|
|
||||||
|
# Determine whether the document uses interleaved blank lines
|
||||||
|
self.blanks_between_paragraphs = self.analyze_blanks(html)
|
||||||
|
|
||||||
|
# detect chapters/sections to match xpath or splitting logic
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'markup_chapter_headings', False):
|
||||||
|
html = self.markup_chapters(html, self.totalwords, self.blanks_between_paragraphs)
|
||||||
|
# self.dump(html, 'after_chapter_markup')
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'italicize_common_cases', False):
|
||||||
|
html = self.markup_italicis(html)
|
||||||
|
|
||||||
|
# If more than 40% of the lines are empty paragraphs and the user has enabled delete
|
||||||
|
# blank paragraphs then delete blank lines to clean up spacing
|
||||||
|
if self.blanks_between_paragraphs and getattr(self.extra_opts, 'delete_blank_paragraphs', False):
|
||||||
|
self.log.debug("deleting blank lines")
|
||||||
|
self.blanks_deleted = True
|
||||||
|
html = self.multi_blank.sub('\n<p class="softbreak" style="margin-top:.5em; page-break-before:avoid; text-align:center"> </p>', html)
|
||||||
|
html = self.blankreg.sub('', html)
|
||||||
|
|
||||||
|
# Determine line ending type
|
||||||
|
# Some OCR sourced files have line breaks in the html using a combination of span & p tags
|
||||||
|
# span are used for hard line breaks, p for new paragraphs. Determine which is used so
|
||||||
|
# that lines can be un-wrapped across page boundaries
|
||||||
|
format = self.analyze_line_endings(html)
|
||||||
|
|
||||||
|
# Check Line histogram to determine if the document uses hard line breaks, If 50% or
|
||||||
|
# more of the lines break in the same region of the document then unwrapping is required
|
||||||
|
docanalysis = DocAnalysis(format, html)
|
||||||
|
hardbreaks = docanalysis.line_histogram(.50)
|
||||||
|
self.log.debug("Hard line breaks check returned "+unicode_type(hardbreaks))
|
||||||
|
|
||||||
|
# Calculate Length
|
||||||
|
unwrap_factor = getattr(self.extra_opts, 'html_unwrap_factor', 0.4)
|
||||||
|
length = docanalysis.line_length(unwrap_factor)
|
||||||
|
self.log.debug("Median line length is " + unicode_type(length) + ", calculated with " + format + " format")
|
||||||
|
|
||||||
|
# ##### Unwrap lines ######
|
||||||
|
if getattr(self.extra_opts, 'unwrap_lines', False):
|
||||||
|
# only go through unwrapping code if the histogram shows unwrapping is required or if the user decreased the default unwrap_factor
|
||||||
|
if hardbreaks or unwrap_factor < 0.4:
|
||||||
|
self.log.debug("Unwrapping required, unwrapping Lines")
|
||||||
|
# Dehyphenate with line length limiters
|
||||||
|
dehyphenator = Dehyphenator(self.extra_opts.verbose, self.log)
|
||||||
|
html = dehyphenator(html,'html', length)
|
||||||
|
html = self.punctuation_unwrap(length, html, 'html')
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'dehyphenate', False):
|
||||||
|
# dehyphenate in cleanup mode to fix anything previous conversions/editing missed
|
||||||
|
self.log.debug("Fixing hyphenated content")
|
||||||
|
dehyphenator = Dehyphenator(self.extra_opts.verbose, self.log)
|
||||||
|
html = dehyphenator(html,'html_cleanup', length)
|
||||||
|
html = dehyphenator(html, 'individual_words', length)
|
||||||
|
|
||||||
|
# If still no sections after unwrapping mark split points on lines with no punctuation
|
||||||
|
if self.html_preprocess_sections < self.min_chapters and getattr(self.extra_opts, 'markup_chapter_headings', False):
|
||||||
|
self.log.debug("Looking for more split points based on punctuation,"
|
||||||
|
" currently have " + unicode_type(self.html_preprocess_sections))
|
||||||
|
chapdetect3 = re.compile(
|
||||||
|
r'<(?P<styles>(p|div)[^>]*)>\s*(?P<section>(<span[^>]*>)?\s*(?!([\W]+\s*)+)'
|
||||||
|
r'(<[ibu][^>]*>){0,2}\s*(<span[^>]*>)?\s*(<[ibu][^>]*>){0,2}\s*(<span[^>]*>)?\s*'
|
||||||
|
r'.?(?=[a-z#\-*\s]+<)([a-z#-*]+\s*){1,5}\s*\s*(</span>)?(</[ibu]>){0,2}\s*'
|
||||||
|
r'(</span>)?\s*(</[ibu]>){0,2}\s*(</span>)?\s*</(p|div)>)', re.IGNORECASE)
|
||||||
|
html = chapdetect3.sub(self.chapter_break, html)
|
||||||
|
|
||||||
|
if getattr(self.extra_opts, 'renumber_headings', False):
|
||||||
|
# search for places where a first or second level heading is immediately followed by another
|
||||||
|
# top level heading. demote the second heading to h3 to prevent splitting between chapter
|
||||||
|
# headings and titles, images, etc
|
||||||
|
doubleheading = re.compile(
|
||||||
|
r'(?P<firsthead><h(1|2)[^>]*>.+?</h(1|2)>\s*(<(?!h\d)[^>]*>\s*)*)<h(1|2)(?P<secondhead>[^>]*>.+?)</h(1|2)>', re.IGNORECASE)
|
||||||
|
html = doubleheading.sub('\\g<firsthead>'+'\n<h3'+'\\g<secondhead>'+'</h3>', html)
|
||||||
|
|
||||||
|
# If scene break formatting is enabled, find all blank paragraphs that definitely aren't scenebreaks,
|
||||||
|
# style it with the 'whitespace' class. All remaining blank lines are styled as softbreaks.
|
||||||
|
# Multiple sequential blank paragraphs are merged with appropriate margins
|
||||||
|
# If non-blank scene breaks exist they are center aligned and styled with appropriate margins.
|
||||||
|
if getattr(self.extra_opts, 'format_scene_breaks', False):
|
||||||
|
self.log.debug('Formatting scene breaks')
|
||||||
|
html = re.sub('(?i)<div[^>]*>\\s*<br(\\s?/)?>\\s*</div>', '<p></p>', html)
|
||||||
|
html = self.detect_scene_breaks(html)
|
||||||
|
html = self.detect_whitespace(html)
|
||||||
|
html = self.detect_soft_breaks(html)
|
||||||
|
blanks_count = len(self.any_multi_blank.findall(html))
|
||||||
|
if blanks_count >= 1:
|
||||||
|
html = self.merge_blanks(html, blanks_count)
|
||||||
|
detected_scene_break = re.compile(r'<p class="scenebreak"[^>]*>.*?</p>')
|
||||||
|
scene_break_count = len(detected_scene_break.findall(html))
|
||||||
|
# If the user has enabled scene break replacement, then either softbreaks
|
||||||
|
# or 'hard' scene breaks are replaced, depending on which is in use
|
||||||
|
# Otherwise separator lines are centered, use a bit larger margin in this case
|
||||||
|
replacement_break = getattr(self.extra_opts, 'replace_scene_breaks', None)
|
||||||
|
if replacement_break:
|
||||||
|
replacement_break = self.markup_user_break(replacement_break)
|
||||||
|
if scene_break_count >= 1:
|
||||||
|
html = detected_scene_break.sub(replacement_break, html)
|
||||||
|
html = re.sub('<p\\s+class="softbreak"[^>]*>\\s*</p>', replacement_break, html)
|
||||||
|
else:
|
||||||
|
html = re.sub('<p\\s+class="softbreak"[^>]*>\\s*</p>', replacement_break, html)
|
||||||
|
|
||||||
|
if self.deleted_nbsps:
|
||||||
|
# put back non-breaking spaces in empty paragraphs so they render correctly
|
||||||
|
html = self.anyblank.sub('\n'+r'\g<openline>'+'\u00a0'+r'\g<closeline>', html)
|
||||||
|
return html
|
||||||
11
ebook_converter/ebooks/docx/__init__.py
Normal file
11
ebook_converter/ebooks/docx/__init__.py
Normal file
@@ -0,0 +1,11 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
|
||||||
|
class InvalidDOCX(ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
478
ebook_converter/ebooks/docx/block_styles.py
Normal file
478
ebook_converter/ebooks/docx/block_styles.py
Normal file
@@ -0,0 +1,478 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import numbers
|
||||||
|
from collections import OrderedDict
|
||||||
|
from polyglot.builtins import iteritems
|
||||||
|
|
||||||
|
|
||||||
|
class Inherit(object):
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
return other is self
|
||||||
|
|
||||||
|
def __hash__(self):
|
||||||
|
return id(self)
|
||||||
|
|
||||||
|
def __lt__(self, other):
|
||||||
|
return False
|
||||||
|
|
||||||
|
def __gt__(self, other):
|
||||||
|
return other is not self
|
||||||
|
|
||||||
|
def __ge__(self, other):
|
||||||
|
if self is other:
|
||||||
|
return True
|
||||||
|
return True
|
||||||
|
|
||||||
|
def __le__(self, other):
|
||||||
|
if self is other:
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
inherit = Inherit()
|
||||||
|
|
||||||
|
|
||||||
|
def binary_property(parent, name, XPath, get):
|
||||||
|
vals = XPath('./w:%s' % name)(parent)
|
||||||
|
if not vals:
|
||||||
|
return inherit
|
||||||
|
val = get(vals[0], 'w:val', 'on')
|
||||||
|
return True if val in {'on', '1', 'true'} else False
|
||||||
|
|
||||||
|
|
||||||
|
def simple_color(col, auto='black'):
|
||||||
|
if not col or col == 'auto' or len(col) != 6:
|
||||||
|
return auto
|
||||||
|
return '#'+col
|
||||||
|
|
||||||
|
|
||||||
|
def simple_float(val, mult=1.0):
|
||||||
|
try:
|
||||||
|
return float(val) * mult
|
||||||
|
except (ValueError, TypeError, AttributeError, KeyError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def twips(val, mult=0.05):
|
||||||
|
''' Parse val as either a pure number representing twentieths of a point or a number followed by the suffix pt, representing pts.'''
|
||||||
|
try:
|
||||||
|
return float(val) * mult
|
||||||
|
except (ValueError, TypeError, AttributeError, KeyError):
|
||||||
|
if val and val.endswith('pt') and mult == 0.05:
|
||||||
|
return twips(val[:-2], mult=1.0)
|
||||||
|
|
||||||
|
|
||||||
|
LINE_STYLES = { # {{{
|
||||||
|
'basicBlackDashes': 'dashed',
|
||||||
|
'basicBlackDots': 'dotted',
|
||||||
|
'basicBlackSquares': 'dashed',
|
||||||
|
'basicThinLines': 'solid',
|
||||||
|
'dashDotStroked': 'groove',
|
||||||
|
'dashed': 'dashed',
|
||||||
|
'dashSmallGap': 'dashed',
|
||||||
|
'dotDash': 'dashed',
|
||||||
|
'dotDotDash': 'dashed',
|
||||||
|
'dotted': 'dotted',
|
||||||
|
'double': 'double',
|
||||||
|
'inset': 'inset',
|
||||||
|
'nil': 'none',
|
||||||
|
'none': 'none',
|
||||||
|
'outset': 'outset',
|
||||||
|
'single': 'solid',
|
||||||
|
'thick': 'solid',
|
||||||
|
'thickThinLargeGap': 'double',
|
||||||
|
'thickThinMediumGap': 'double',
|
||||||
|
'thickThinSmallGap' : 'double',
|
||||||
|
'thinThickLargeGap': 'double',
|
||||||
|
'thinThickMediumGap': 'double',
|
||||||
|
'thinThickSmallGap': 'double',
|
||||||
|
'thinThickThinLargeGap': 'double',
|
||||||
|
'thinThickThinMediumGap': 'double',
|
||||||
|
'thinThickThinSmallGap': 'double',
|
||||||
|
'threeDEmboss': 'ridge',
|
||||||
|
'threeDEngrave': 'groove',
|
||||||
|
'triple': 'double',
|
||||||
|
} # }}}
|
||||||
|
|
||||||
|
# Read from XML {{{
|
||||||
|
|
||||||
|
border_props = ('padding_%s', 'border_%s_width', 'border_%s_style', 'border_%s_color')
|
||||||
|
border_edges = ('left', 'top', 'right', 'bottom', 'between')
|
||||||
|
|
||||||
|
|
||||||
|
def read_single_border(parent, edge, XPath, get):
|
||||||
|
color = style = width = padding = None
|
||||||
|
for elem in XPath('./w:%s' % edge)(parent):
|
||||||
|
c = get(elem, 'w:color')
|
||||||
|
if c is not None:
|
||||||
|
color = simple_color(c)
|
||||||
|
s = get(elem, 'w:val')
|
||||||
|
if s is not None:
|
||||||
|
style = LINE_STYLES.get(s, 'solid')
|
||||||
|
space = get(elem, 'w:space')
|
||||||
|
if space is not None:
|
||||||
|
try:
|
||||||
|
padding = float(space)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
sz = get(elem, 'w:sz')
|
||||||
|
if sz is not None:
|
||||||
|
# we dont care about art borders (they are only used for page borders)
|
||||||
|
try:
|
||||||
|
width = min(96, max(2, float(sz))) / 8
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
return {p:v for p, v in zip(border_props, (padding, width, style, color))}
|
||||||
|
|
||||||
|
|
||||||
|
def read_border(parent, dest, XPath, get, border_edges=border_edges, name='pBdr'):
|
||||||
|
vals = {k % edge:inherit for edge in border_edges for k in border_props}
|
||||||
|
|
||||||
|
for border in XPath('./w:' + name)(parent):
|
||||||
|
for edge in border_edges:
|
||||||
|
for prop, val in iteritems(read_single_border(border, edge, XPath, get)):
|
||||||
|
if val is not None:
|
||||||
|
vals[prop % edge] = val
|
||||||
|
|
||||||
|
for key, val in iteritems(vals):
|
||||||
|
setattr(dest, key, val)
|
||||||
|
|
||||||
|
|
||||||
|
def border_to_css(edge, style, css):
|
||||||
|
bs = getattr(style, 'border_%s_style' % edge)
|
||||||
|
bc = getattr(style, 'border_%s_color' % edge)
|
||||||
|
bw = getattr(style, 'border_%s_width' % edge)
|
||||||
|
if isinstance(bw, numbers.Number):
|
||||||
|
# WebKit needs at least 1pt to render borders and 3pt to render double borders
|
||||||
|
bw = max(bw, (3 if bs == 'double' else 1))
|
||||||
|
if bs is not inherit and bs is not None:
|
||||||
|
css['border-%s-style' % edge] = bs
|
||||||
|
if bc is not inherit and bc is not None:
|
||||||
|
css['border-%s-color' % edge] = bc
|
||||||
|
if bw is not inherit and bw is not None:
|
||||||
|
if isinstance(bw, numbers.Number):
|
||||||
|
bw = '%.3gpt' % bw
|
||||||
|
css['border-%s-width' % edge] = bw
|
||||||
|
|
||||||
|
|
||||||
|
def read_indent(parent, dest, XPath, get):
|
||||||
|
padding_left = padding_right = text_indent = inherit
|
||||||
|
for indent in XPath('./w:ind')(parent):
|
||||||
|
l, lc = get(indent, 'w:left'), get(indent, 'w:leftChars')
|
||||||
|
pl = simple_float(lc, 0.01) if lc is not None else simple_float(l, 0.05) if l is not None else None
|
||||||
|
if pl is not None:
|
||||||
|
padding_left = '%.3g%s' % (pl, 'em' if lc is not None else 'pt')
|
||||||
|
|
||||||
|
r, rc = get(indent, 'w:right'), get(indent, 'w:rightChars')
|
||||||
|
pr = simple_float(rc, 0.01) if rc is not None else simple_float(r, 0.05) if r is not None else None
|
||||||
|
if pr is not None:
|
||||||
|
padding_right = '%.3g%s' % (pr, 'em' if rc is not None else 'pt')
|
||||||
|
|
||||||
|
h, hc = get(indent, 'w:hanging'), get(indent, 'w:hangingChars')
|
||||||
|
fl, flc = get(indent, 'w:firstLine'), get(indent, 'w:firstLineChars')
|
||||||
|
h = h if h is None else '-'+h
|
||||||
|
hc = hc if hc is None else '-'+hc
|
||||||
|
ti = (simple_float(hc, 0.01) if hc is not None else simple_float(h, 0.05) if h is not None else
|
||||||
|
simple_float(flc, 0.01) if flc is not None else simple_float(fl, 0.05) if fl is not None else None)
|
||||||
|
if ti is not None:
|
||||||
|
text_indent = '%.3g%s' % (ti, 'em' if hc is not None or (h is None and flc is not None) else 'pt')
|
||||||
|
|
||||||
|
setattr(dest, 'margin_left', padding_left)
|
||||||
|
setattr(dest, 'margin_right', padding_right)
|
||||||
|
setattr(dest, 'text_indent', text_indent)
|
||||||
|
|
||||||
|
|
||||||
|
def read_justification(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for jc in XPath('./w:jc[@w:val]')(parent):
|
||||||
|
val = get(jc, 'w:val')
|
||||||
|
if not val:
|
||||||
|
continue
|
||||||
|
if val in {'both', 'distribute'} or 'thai' in val or 'kashida' in val:
|
||||||
|
ans = 'justify'
|
||||||
|
elif val in {'left', 'center', 'right', 'start', 'end'}:
|
||||||
|
ans = val
|
||||||
|
elif val in {'start', 'end'}:
|
||||||
|
ans = {'start':'left'}.get(val, 'right')
|
||||||
|
setattr(dest, 'text_align', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_spacing(parent, dest, XPath, get):
|
||||||
|
padding_top = padding_bottom = line_height = inherit
|
||||||
|
for s in XPath('./w:spacing')(parent):
|
||||||
|
a, al, aa = get(s, 'w:after'), get(s, 'w:afterLines'), get(s, 'w:afterAutospacing')
|
||||||
|
pb = None if aa in {'on', '1', 'true'} else simple_float(al, 0.02) if al is not None else simple_float(a, 0.05) if a is not None else None
|
||||||
|
if pb is not None:
|
||||||
|
padding_bottom = '%.3g%s' % (pb, 'ex' if al is not None else 'pt')
|
||||||
|
|
||||||
|
b, bl, bb = get(s, 'w:before'), get(s, 'w:beforeLines'), get(s, 'w:beforeAutospacing')
|
||||||
|
pt = None if bb in {'on', '1', 'true'} else simple_float(bl, 0.02) if bl is not None else simple_float(b, 0.05) if b is not None else None
|
||||||
|
if pt is not None:
|
||||||
|
padding_top = '%.3g%s' % (pt, 'ex' if bl is not None else 'pt')
|
||||||
|
|
||||||
|
l, lr = get(s, 'w:line'), get(s, 'w:lineRule', 'auto')
|
||||||
|
if l is not None:
|
||||||
|
lh = simple_float(l, 0.05) if lr in {'exact', 'atLeast'} else simple_float(l, 1/240.0)
|
||||||
|
if lh is not None:
|
||||||
|
line_height = '%.3g%s' % (lh, 'pt' if lr in {'exact', 'atLeast'} else '')
|
||||||
|
|
||||||
|
setattr(dest, 'margin_top', padding_top)
|
||||||
|
setattr(dest, 'margin_bottom', padding_bottom)
|
||||||
|
setattr(dest, 'line_height', line_height)
|
||||||
|
|
||||||
|
|
||||||
|
def read_shd(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for shd in XPath('./w:shd[@w:fill]')(parent):
|
||||||
|
val = get(shd, 'w:fill')
|
||||||
|
if val:
|
||||||
|
ans = simple_color(val, auto='transparent')
|
||||||
|
setattr(dest, 'background_color', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_numbering(parent, dest, XPath, get):
|
||||||
|
lvl = num_id = inherit
|
||||||
|
for np in XPath('./w:numPr')(parent):
|
||||||
|
for ilvl in XPath('./w:ilvl[@w:val]')(np):
|
||||||
|
try:
|
||||||
|
lvl = int(get(ilvl, 'w:val'))
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
for num in XPath('./w:numId[@w:val]')(np):
|
||||||
|
num_id = get(num, 'w:val')
|
||||||
|
setattr(dest, 'numbering_id', num_id)
|
||||||
|
setattr(dest, 'numbering_level', lvl)
|
||||||
|
|
||||||
|
|
||||||
|
class Frame(object):
|
||||||
|
|
||||||
|
all_attributes = ('drop_cap', 'h', 'w', 'h_anchor', 'h_rule', 'v_anchor', 'wrap',
|
||||||
|
'h_space', 'v_space', 'lines', 'x_align', 'y_align', 'x', 'y')
|
||||||
|
|
||||||
|
def __init__(self, fp, XPath, get):
|
||||||
|
self.drop_cap = get(fp, 'w:dropCap', 'none')
|
||||||
|
try:
|
||||||
|
self.h = int(get(fp, 'w:h'))/20
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.h = 0
|
||||||
|
try:
|
||||||
|
self.w = int(get(fp, 'w:w'))/20
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.w = None
|
||||||
|
try:
|
||||||
|
self.x = int(get(fp, 'w:x'))/20
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.x = 0
|
||||||
|
try:
|
||||||
|
self.y = int(get(fp, 'w:y'))/20
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.y = 0
|
||||||
|
|
||||||
|
self.h_anchor = get(fp, 'w:hAnchor', 'page')
|
||||||
|
self.h_rule = get(fp, 'w:hRule', 'auto')
|
||||||
|
self.v_anchor = get(fp, 'w:vAnchor', 'page')
|
||||||
|
self.wrap = get(fp, 'w:wrap', 'around')
|
||||||
|
self.x_align = get(fp, 'w:xAlign')
|
||||||
|
self.y_align = get(fp, 'w:yAlign')
|
||||||
|
|
||||||
|
try:
|
||||||
|
self.h_space = int(get(fp, 'w:hSpace'))/20
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.h_space = 0
|
||||||
|
try:
|
||||||
|
self.v_space = int(get(fp, 'w:vSpace'))/20
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.v_space = 0
|
||||||
|
try:
|
||||||
|
self.lines = int(get(fp, 'w:lines'))
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
self.lines = 1
|
||||||
|
|
||||||
|
def css(self, page):
|
||||||
|
is_dropcap = self.drop_cap in {'drop', 'margin'}
|
||||||
|
ans = {'overflow': 'hidden'}
|
||||||
|
|
||||||
|
if is_dropcap:
|
||||||
|
ans['float'] = 'left'
|
||||||
|
ans['margin'] = '0'
|
||||||
|
ans['padding-right'] = '0.2em'
|
||||||
|
else:
|
||||||
|
if self.h_rule != 'auto':
|
||||||
|
t = 'min-height' if self.h_rule == 'atLeast' else 'height'
|
||||||
|
ans[t] = '%.3gpt' % self.h
|
||||||
|
if self.w is not None:
|
||||||
|
ans['width'] = '%.3gpt' % self.w
|
||||||
|
ans['padding-top'] = ans['padding-bottom'] = '%.3gpt' % self.v_space
|
||||||
|
if self.wrap not in {None, 'none'}:
|
||||||
|
ans['padding-left'] = ans['padding-right'] = '%.3gpt' % self.h_space
|
||||||
|
if self.x_align is None:
|
||||||
|
fl = 'left' if self.x/page.width < 0.5 else 'right'
|
||||||
|
else:
|
||||||
|
fl = 'right' if self.x_align == 'right' else 'left'
|
||||||
|
ans['float'] = fl
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
for x in self.all_attributes:
|
||||||
|
if getattr(other, x, inherit) != getattr(self, x):
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
def __ne__(self, other):
|
||||||
|
return not self.__eq__(other)
|
||||||
|
|
||||||
|
|
||||||
|
def read_frame(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for fp in XPath('./w:framePr')(parent):
|
||||||
|
ans = Frame(fp, XPath, get)
|
||||||
|
setattr(dest, 'frame', ans)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class ParagraphStyle(object):
|
||||||
|
|
||||||
|
all_properties = (
|
||||||
|
'adjustRightInd', 'autoSpaceDE', 'autoSpaceDN', 'bidi',
|
||||||
|
'contextualSpacing', 'keepLines', 'keepNext', 'mirrorIndents',
|
||||||
|
'pageBreakBefore', 'snapToGrid', 'suppressLineNumbers',
|
||||||
|
'suppressOverlap', 'topLinePunct', 'widowControl', 'wordWrap',
|
||||||
|
|
||||||
|
# Border margins padding
|
||||||
|
'border_left_width', 'border_left_style', 'border_left_color', 'padding_left',
|
||||||
|
'border_top_width', 'border_top_style', 'border_top_color', 'padding_top',
|
||||||
|
'border_right_width', 'border_right_style', 'border_right_color', 'padding_right',
|
||||||
|
'border_bottom_width', 'border_bottom_style', 'border_bottom_color', 'padding_bottom',
|
||||||
|
'border_between_width', 'border_between_style', 'border_between_color', 'padding_between',
|
||||||
|
'margin_left', 'margin_top', 'margin_right', 'margin_bottom',
|
||||||
|
|
||||||
|
# Misc.
|
||||||
|
'text_indent', 'text_align', 'line_height', 'background_color',
|
||||||
|
'numbering_id', 'numbering_level', 'font_family', 'font_size', 'color', 'frame',
|
||||||
|
'cs_font_size', 'cs_font_family',
|
||||||
|
)
|
||||||
|
|
||||||
|
def __init__(self, namespace, pPr=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.linked_style = None
|
||||||
|
if pPr is None:
|
||||||
|
for p in self.all_properties:
|
||||||
|
setattr(self, p, inherit)
|
||||||
|
else:
|
||||||
|
for p in (
|
||||||
|
'adjustRightInd', 'autoSpaceDE', 'autoSpaceDN', 'bidi',
|
||||||
|
'contextualSpacing', 'keepLines', 'keepNext', 'mirrorIndents',
|
||||||
|
'pageBreakBefore', 'snapToGrid', 'suppressLineNumbers',
|
||||||
|
'suppressOverlap', 'topLinePunct', 'widowControl', 'wordWrap',
|
||||||
|
):
|
||||||
|
setattr(self, p, binary_property(pPr, p, namespace.XPath, namespace.get))
|
||||||
|
|
||||||
|
for x in ('border', 'indent', 'justification', 'spacing', 'shd', 'numbering', 'frame'):
|
||||||
|
f = read_funcs[x]
|
||||||
|
f(pPr, self, namespace.XPath, namespace.get)
|
||||||
|
|
||||||
|
for s in namespace.XPath('./w:pStyle[@w:val]')(pPr):
|
||||||
|
self.linked_style = namespace.get(s, 'w:val')
|
||||||
|
|
||||||
|
self.font_family = self.font_size = self.color = self.cs_font_size = self.cs_font_family = inherit
|
||||||
|
|
||||||
|
self._css = None
|
||||||
|
self._border_key = None
|
||||||
|
|
||||||
|
def update(self, other):
|
||||||
|
for prop in self.all_properties:
|
||||||
|
nval = getattr(other, prop)
|
||||||
|
if nval is not inherit:
|
||||||
|
setattr(self, prop, nval)
|
||||||
|
if other.linked_style is not None:
|
||||||
|
self.linked_style = other.linked_style
|
||||||
|
|
||||||
|
def resolve_based_on(self, parent):
|
||||||
|
for p in self.all_properties:
|
||||||
|
val = getattr(self, p)
|
||||||
|
if val is inherit:
|
||||||
|
setattr(self, p, getattr(parent, p))
|
||||||
|
|
||||||
|
@property
|
||||||
|
def css(self):
|
||||||
|
if self._css is None:
|
||||||
|
self._css = c = OrderedDict()
|
||||||
|
if self.keepLines is True:
|
||||||
|
c['page-break-inside'] = 'avoid'
|
||||||
|
if self.pageBreakBefore is True:
|
||||||
|
c['page-break-before'] = 'always'
|
||||||
|
if self.keepNext is True:
|
||||||
|
c['page-break-after'] = 'avoid'
|
||||||
|
for edge in ('left', 'top', 'right', 'bottom'):
|
||||||
|
border_to_css(edge, self, c)
|
||||||
|
val = getattr(self, 'padding_%s' % edge)
|
||||||
|
if val is not inherit:
|
||||||
|
c['padding-%s' % edge] = '%.3gpt' % val
|
||||||
|
val = getattr(self, 'margin_%s' % edge)
|
||||||
|
if val is not inherit:
|
||||||
|
c['margin-%s' % edge] = val
|
||||||
|
|
||||||
|
if self.line_height not in {inherit, '1'}:
|
||||||
|
c['line-height'] = self.line_height
|
||||||
|
|
||||||
|
for x in ('text_indent', 'background_color', 'font_family', 'font_size', 'color'):
|
||||||
|
val = getattr(self, x)
|
||||||
|
if val is not inherit:
|
||||||
|
if x == 'font_size':
|
||||||
|
val = '%.3gpt' % val
|
||||||
|
c[x.replace('_', '-')] = val
|
||||||
|
ta = self.text_align
|
||||||
|
if ta is not inherit:
|
||||||
|
if self.bidi is True:
|
||||||
|
ta = {'left':'right', 'right':'left'}.get(ta, ta)
|
||||||
|
c['text-align'] = ta
|
||||||
|
|
||||||
|
return self._css
|
||||||
|
|
||||||
|
@property
|
||||||
|
def border_key(self):
|
||||||
|
if self._border_key is None:
|
||||||
|
k = []
|
||||||
|
for edge in border_edges:
|
||||||
|
for prop in border_props:
|
||||||
|
prop = prop % edge
|
||||||
|
k.append(getattr(self, prop))
|
||||||
|
self._border_key = tuple(k)
|
||||||
|
return self._border_key
|
||||||
|
|
||||||
|
def has_identical_borders(self, other_style):
|
||||||
|
return self.border_key == getattr(other_style, 'border_key', None)
|
||||||
|
|
||||||
|
def clear_borders(self):
|
||||||
|
for edge in border_edges[:-1]:
|
||||||
|
for prop in ('width', 'color', 'style'):
|
||||||
|
setattr(self, 'border_%s_%s' % (edge, prop), inherit)
|
||||||
|
|
||||||
|
def clone_border_styles(self):
|
||||||
|
style = ParagraphStyle(self.namespace)
|
||||||
|
for edge in border_edges[:-1]:
|
||||||
|
for prop in ('width', 'color', 'style'):
|
||||||
|
attr = 'border_%s_%s' % (edge, prop)
|
||||||
|
setattr(style, attr, getattr(self, attr))
|
||||||
|
return style
|
||||||
|
|
||||||
|
def apply_between_border(self):
|
||||||
|
for prop in ('width', 'color', 'style'):
|
||||||
|
setattr(self, 'border_bottom_%s' % prop, getattr(self, 'border_between_%s' % prop))
|
||||||
|
|
||||||
|
def has_visible_border(self):
|
||||||
|
for edge in border_edges[:-1]:
|
||||||
|
bw, bs = getattr(self, 'border_%s_width' % edge), getattr(self, 'border_%s_style' % edge)
|
||||||
|
if bw is not inherit and bw and bs is not inherit and bs != 'none':
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
read_funcs = {k[5:]:v for k, v in iteritems(globals()) if k.startswith('read_')}
|
||||||
302
ebook_converter/ebooks/docx/char_styles.py
Normal file
302
ebook_converter/ebooks/docx/char_styles.py
Normal file
@@ -0,0 +1,302 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from collections import OrderedDict
|
||||||
|
from calibre.ebooks.docx.block_styles import ( # noqa
|
||||||
|
inherit, simple_color, LINE_STYLES, simple_float, binary_property, read_shd)
|
||||||
|
|
||||||
|
# Read from XML {{{
|
||||||
|
|
||||||
|
|
||||||
|
def read_text_border(parent, dest, XPath, get):
|
||||||
|
border_color = border_style = border_width = padding = inherit
|
||||||
|
elems = XPath('./w:bdr')(parent)
|
||||||
|
if elems and elems[0].attrib:
|
||||||
|
border_color = simple_color('auto')
|
||||||
|
border_style = 'none'
|
||||||
|
border_width = 1
|
||||||
|
for elem in elems:
|
||||||
|
color = get(elem, 'w:color')
|
||||||
|
if color is not None:
|
||||||
|
border_color = simple_color(color)
|
||||||
|
style = get(elem, 'w:val')
|
||||||
|
if style is not None:
|
||||||
|
border_style = LINE_STYLES.get(style, 'solid')
|
||||||
|
space = get(elem, 'w:space')
|
||||||
|
if space is not None:
|
||||||
|
try:
|
||||||
|
padding = float(space)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
sz = get(elem, 'w:sz')
|
||||||
|
if sz is not None:
|
||||||
|
# we dont care about art borders (they are only used for page borders)
|
||||||
|
try:
|
||||||
|
# A border of less than 1pt is not rendered by WebKit
|
||||||
|
border_width = min(96, max(8, float(sz))) / 8
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
setattr(dest, 'border_color', border_color)
|
||||||
|
setattr(dest, 'border_style', border_style)
|
||||||
|
setattr(dest, 'border_width', border_width)
|
||||||
|
setattr(dest, 'padding', padding)
|
||||||
|
|
||||||
|
|
||||||
|
def read_color(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:color[@w:val]')(parent):
|
||||||
|
val = get(col, 'w:val')
|
||||||
|
if not val:
|
||||||
|
continue
|
||||||
|
ans = simple_color(val)
|
||||||
|
setattr(dest, 'color', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def convert_highlight_color(val):
|
||||||
|
return {
|
||||||
|
'darkBlue': '#000080', 'darkCyan': '#008080', 'darkGray': '#808080',
|
||||||
|
'darkGreen': '#008000', 'darkMagenta': '#800080', 'darkRed': '#800000', 'darkYellow': '#808000',
|
||||||
|
'lightGray': '#c0c0c0'}.get(val, val)
|
||||||
|
|
||||||
|
|
||||||
|
def read_highlight(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:highlight[@w:val]')(parent):
|
||||||
|
val = get(col, 'w:val')
|
||||||
|
if not val:
|
||||||
|
continue
|
||||||
|
if not val or val == 'none':
|
||||||
|
val = 'transparent'
|
||||||
|
else:
|
||||||
|
val = convert_highlight_color(val)
|
||||||
|
ans = val
|
||||||
|
setattr(dest, 'highlight', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_lang(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:lang[@w:val]')(parent):
|
||||||
|
val = get(col, 'w:val')
|
||||||
|
if not val:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
code = int(val, 16)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
ans = val
|
||||||
|
else:
|
||||||
|
from calibre.ebooks.docx.lcid import lcid
|
||||||
|
val = lcid.get(code, None)
|
||||||
|
if val:
|
||||||
|
ans = val
|
||||||
|
setattr(dest, 'lang', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_letter_spacing(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:spacing[@w:val]')(parent):
|
||||||
|
val = simple_float(get(col, 'w:val'), 0.05)
|
||||||
|
if val is not None:
|
||||||
|
ans = val
|
||||||
|
setattr(dest, 'letter_spacing', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_underline(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:u[@w:val]')(parent):
|
||||||
|
val = get(col, 'w:val')
|
||||||
|
if val:
|
||||||
|
ans = val if val == 'none' else 'underline'
|
||||||
|
setattr(dest, 'text_decoration', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_vert_align(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:vertAlign[@w:val]')(parent):
|
||||||
|
val = get(col, 'w:val')
|
||||||
|
if val and val in {'baseline', 'subscript', 'superscript'}:
|
||||||
|
ans = val
|
||||||
|
setattr(dest, 'vert_align', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_position(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for col in XPath('./w:position[@w:val]')(parent):
|
||||||
|
val = get(col, 'w:val')
|
||||||
|
try:
|
||||||
|
ans = float(val)/2.0
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
setattr(dest, 'position', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_font(parent, dest, XPath, get):
|
||||||
|
ff = inherit
|
||||||
|
for col in XPath('./w:rFonts')(parent):
|
||||||
|
val = get(col, 'w:asciiTheme')
|
||||||
|
if val:
|
||||||
|
val = '|%s|' % val
|
||||||
|
else:
|
||||||
|
val = get(col, 'w:ascii')
|
||||||
|
if val:
|
||||||
|
ff = val
|
||||||
|
setattr(dest, 'font_family', ff)
|
||||||
|
for col in XPath('./w:sz[@w:val]')(parent):
|
||||||
|
val = simple_float(get(col, 'w:val'), 0.5)
|
||||||
|
if val is not None:
|
||||||
|
setattr(dest, 'font_size', val)
|
||||||
|
return
|
||||||
|
setattr(dest, 'font_size', inherit)
|
||||||
|
|
||||||
|
|
||||||
|
def read_font_cs(parent, dest, XPath, get):
|
||||||
|
ff = inherit
|
||||||
|
for col in XPath('./w:rFonts')(parent):
|
||||||
|
val = get(col, 'w:csTheme')
|
||||||
|
if val:
|
||||||
|
val = '|%s|' % val
|
||||||
|
else:
|
||||||
|
val = get(col, 'w:cs')
|
||||||
|
if val:
|
||||||
|
ff = val
|
||||||
|
setattr(dest, 'cs_font_family', ff)
|
||||||
|
for col in XPath('./w:szCS[@w:val]')(parent):
|
||||||
|
val = simple_float(get(col, 'w:val'), 0.5)
|
||||||
|
if val is not None:
|
||||||
|
setattr(dest, 'font_size', val)
|
||||||
|
return
|
||||||
|
setattr(dest, 'cs_font_size', inherit)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class RunStyle(object):
|
||||||
|
|
||||||
|
all_properties = {
|
||||||
|
'b', 'bCs', 'caps', 'cs', 'dstrike', 'emboss', 'i', 'iCs', 'imprint',
|
||||||
|
'rtl', 'shadow', 'smallCaps', 'strike', 'vanish', 'webHidden',
|
||||||
|
|
||||||
|
'border_color', 'border_style', 'border_width', 'padding', 'color', 'highlight', 'background_color',
|
||||||
|
'letter_spacing', 'font_size', 'text_decoration', 'vert_align', 'lang', 'font_family', 'position',
|
||||||
|
'cs_font_size', 'cs_font_family'
|
||||||
|
}
|
||||||
|
|
||||||
|
toggle_properties = {
|
||||||
|
'b', 'bCs', 'caps', 'emboss', 'i', 'iCs', 'imprint', 'shadow', 'smallCaps', 'strike', 'vanish',
|
||||||
|
}
|
||||||
|
|
||||||
|
def __init__(self, namespace, rPr=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.linked_style = None
|
||||||
|
if rPr is None:
|
||||||
|
for p in self.all_properties:
|
||||||
|
setattr(self, p, inherit)
|
||||||
|
else:
|
||||||
|
X, g = namespace.XPath, namespace.get
|
||||||
|
for p in (
|
||||||
|
'b', 'bCs', 'caps', 'cs', 'dstrike', 'emboss', 'i', 'iCs', 'imprint', 'rtl', 'shadow',
|
||||||
|
'smallCaps', 'strike', 'vanish', 'webHidden',
|
||||||
|
):
|
||||||
|
setattr(self, p, binary_property(rPr, p, X, g))
|
||||||
|
|
||||||
|
read_font(rPr, self, X, g)
|
||||||
|
read_font_cs(rPr, self, X, g)
|
||||||
|
read_text_border(rPr, self, X, g)
|
||||||
|
read_color(rPr, self, X, g)
|
||||||
|
read_highlight(rPr, self, X, g)
|
||||||
|
read_shd(rPr, self, X, g)
|
||||||
|
read_letter_spacing(rPr, self, X, g)
|
||||||
|
read_underline(rPr, self, X, g)
|
||||||
|
read_vert_align(rPr, self, X, g)
|
||||||
|
read_position(rPr, self, X, g)
|
||||||
|
read_lang(rPr, self, X, g)
|
||||||
|
|
||||||
|
for s in X('./w:rStyle[@w:val]')(rPr):
|
||||||
|
self.linked_style = g(s, 'w:val')
|
||||||
|
|
||||||
|
self._css = None
|
||||||
|
|
||||||
|
def update(self, other):
|
||||||
|
for prop in self.all_properties:
|
||||||
|
nval = getattr(other, prop)
|
||||||
|
if nval is not inherit:
|
||||||
|
setattr(self, prop, nval)
|
||||||
|
if other.linked_style is not None:
|
||||||
|
self.linked_style = other.linked_style
|
||||||
|
|
||||||
|
def resolve_based_on(self, parent):
|
||||||
|
for p in self.all_properties:
|
||||||
|
val = getattr(self, p)
|
||||||
|
if val is inherit:
|
||||||
|
setattr(self, p, getattr(parent, p))
|
||||||
|
|
||||||
|
def get_border_css(self, ans):
|
||||||
|
for x in ('color', 'style', 'width'):
|
||||||
|
val = getattr(self, 'border_'+x)
|
||||||
|
if x == 'width' and val is not inherit:
|
||||||
|
val = '%.3gpt' % val
|
||||||
|
if val is not inherit:
|
||||||
|
ans['border-%s' % x] = val
|
||||||
|
|
||||||
|
def clear_border_css(self):
|
||||||
|
for x in ('color', 'style', 'width'):
|
||||||
|
setattr(self, 'border_'+x, inherit)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def css(self):
|
||||||
|
if self._css is None:
|
||||||
|
c = self._css = OrderedDict()
|
||||||
|
td = set()
|
||||||
|
if self.text_decoration is not inherit:
|
||||||
|
td.add(self.text_decoration)
|
||||||
|
if self.strike and self.strike is not inherit:
|
||||||
|
td.add('line-through')
|
||||||
|
if self.dstrike and self.dstrike is not inherit:
|
||||||
|
td.add('line-through')
|
||||||
|
if td:
|
||||||
|
c['text-decoration'] = ' '.join(td)
|
||||||
|
if self.caps is True:
|
||||||
|
c['text-transform'] = 'uppercase'
|
||||||
|
if self.i is True:
|
||||||
|
c['font-style'] = 'italic'
|
||||||
|
if self.shadow and self.shadow is not inherit:
|
||||||
|
c['text-shadow'] = '2px 2px'
|
||||||
|
if self.smallCaps is True:
|
||||||
|
c['font-variant'] = 'small-caps'
|
||||||
|
if self.vanish is True or self.webHidden is True:
|
||||||
|
c['display'] = 'none'
|
||||||
|
|
||||||
|
self.get_border_css(c)
|
||||||
|
if self.padding is not inherit:
|
||||||
|
c['padding'] = '%.3gpt' % self.padding
|
||||||
|
|
||||||
|
for x in ('color', 'background_color'):
|
||||||
|
val = getattr(self, x)
|
||||||
|
if val is not inherit:
|
||||||
|
c[x.replace('_', '-')] = val
|
||||||
|
|
||||||
|
for x in ('letter_spacing', 'font_size'):
|
||||||
|
val = getattr(self, x)
|
||||||
|
if val is not inherit:
|
||||||
|
c[x.replace('_', '-')] = '%.3gpt' % val
|
||||||
|
|
||||||
|
if self.position is not inherit:
|
||||||
|
c['vertical-align'] = '%.3gpt' % self.position
|
||||||
|
|
||||||
|
if self.highlight is not inherit and self.highlight != 'transparent':
|
||||||
|
c['background-color'] = self.highlight
|
||||||
|
|
||||||
|
if self.b:
|
||||||
|
c['font-weight'] = 'bold'
|
||||||
|
|
||||||
|
if self.font_family is not inherit:
|
||||||
|
c['font-family'] = self.font_family
|
||||||
|
|
||||||
|
return self._css
|
||||||
|
|
||||||
|
def same_border(self, other):
|
||||||
|
return self.get_border_css({}) == other.get_border_css({})
|
||||||
235
ebook_converter/ebooks/docx/cleanup.py
Normal file
235
ebook_converter/ebooks/docx/cleanup.py
Normal file
@@ -0,0 +1,235 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os
|
||||||
|
from polyglot.builtins import itervalues, range
|
||||||
|
|
||||||
|
NBSP = '\xa0'
|
||||||
|
|
||||||
|
|
||||||
|
def mergeable(previous, current):
|
||||||
|
if previous.tail or current.tail:
|
||||||
|
return False
|
||||||
|
if previous.get('class', None) != current.get('class', None):
|
||||||
|
return False
|
||||||
|
if current.get('id', False):
|
||||||
|
return False
|
||||||
|
for attr in ('style', 'lang', 'dir'):
|
||||||
|
if previous.get(attr) != current.get(attr):
|
||||||
|
return False
|
||||||
|
try:
|
||||||
|
return next(previous.itersiblings()) is current
|
||||||
|
except StopIteration:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def append_text(parent, text):
|
||||||
|
if len(parent) > 0:
|
||||||
|
parent[-1].tail = (parent[-1].tail or '') + text
|
||||||
|
else:
|
||||||
|
parent.text = (parent.text or '') + text
|
||||||
|
|
||||||
|
|
||||||
|
def merge(parent, span):
|
||||||
|
if span.text:
|
||||||
|
append_text(parent, span.text)
|
||||||
|
for child in span:
|
||||||
|
parent.append(child)
|
||||||
|
if span.tail:
|
||||||
|
append_text(parent, span.tail)
|
||||||
|
span.getparent().remove(span)
|
||||||
|
|
||||||
|
|
||||||
|
def merge_run(run):
|
||||||
|
parent = run[0]
|
||||||
|
for span in run[1:]:
|
||||||
|
merge(parent, span)
|
||||||
|
|
||||||
|
|
||||||
|
def liftable(css):
|
||||||
|
# A <span> is liftable if all its styling would work just as well if it is
|
||||||
|
# specified on the parent element.
|
||||||
|
prefixes = {x.partition('-')[0] for x in css}
|
||||||
|
return not (prefixes - {'text', 'font', 'letter', 'color', 'background'})
|
||||||
|
|
||||||
|
|
||||||
|
def add_text(elem, attr, text):
|
||||||
|
old = getattr(elem, attr) or ''
|
||||||
|
setattr(elem, attr, old + text)
|
||||||
|
|
||||||
|
|
||||||
|
def lift(span):
|
||||||
|
# Replace an element by its content (text, children and tail)
|
||||||
|
parent = span.getparent()
|
||||||
|
idx = parent.index(span)
|
||||||
|
try:
|
||||||
|
last_child = span[-1]
|
||||||
|
except IndexError:
|
||||||
|
last_child = None
|
||||||
|
|
||||||
|
if span.text:
|
||||||
|
if idx == 0:
|
||||||
|
add_text(parent, 'text', span.text)
|
||||||
|
else:
|
||||||
|
add_text(parent[idx - 1], 'tail', span.text)
|
||||||
|
|
||||||
|
for child in reversed(span):
|
||||||
|
parent.insert(idx, child)
|
||||||
|
parent.remove(span)
|
||||||
|
|
||||||
|
if span.tail:
|
||||||
|
if last_child is None:
|
||||||
|
if idx == 0:
|
||||||
|
add_text(parent, 'text', span.tail)
|
||||||
|
else:
|
||||||
|
add_text(parent[idx - 1], 'tail', span.tail)
|
||||||
|
else:
|
||||||
|
add_text(last_child, 'tail', span.tail)
|
||||||
|
|
||||||
|
|
||||||
|
def before_count(root, tag, limit=10):
|
||||||
|
body = root.xpath('//body[1]')
|
||||||
|
if not body:
|
||||||
|
return limit
|
||||||
|
ans = 0
|
||||||
|
for elem in body[0].iterdescendants():
|
||||||
|
if elem is tag:
|
||||||
|
return ans
|
||||||
|
ans += 1
|
||||||
|
if ans > limit:
|
||||||
|
return limit
|
||||||
|
|
||||||
|
|
||||||
|
def wrap_contents(tag_name, elem):
|
||||||
|
wrapper = elem.makeelement(tag_name)
|
||||||
|
wrapper.text, elem.text = elem.text, ''
|
||||||
|
for child in elem:
|
||||||
|
elem.remove(child)
|
||||||
|
wrapper.append(child)
|
||||||
|
elem.append(wrapper)
|
||||||
|
|
||||||
|
|
||||||
|
def cleanup_markup(log, root, styles, dest_dir, detect_cover, XPath):
|
||||||
|
# Apply vertical-align
|
||||||
|
for span in root.xpath('//span[@data-docx-vert]'):
|
||||||
|
wrap_contents(span.attrib.pop('data-docx-vert'), span)
|
||||||
|
|
||||||
|
# Move <hr>s outside paragraphs, if possible.
|
||||||
|
pancestor = XPath('|'.join('ancestor::%s[1]' % x for x in ('p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6')))
|
||||||
|
for hr in root.xpath('//span/hr'):
|
||||||
|
p = pancestor(hr)
|
||||||
|
if p:
|
||||||
|
p = p[0]
|
||||||
|
descendants = tuple(p.iterdescendants())
|
||||||
|
if descendants[-1] is hr:
|
||||||
|
parent = p.getparent()
|
||||||
|
idx = parent.index(p)
|
||||||
|
parent.insert(idx+1, hr)
|
||||||
|
hr.tail = '\n\t'
|
||||||
|
|
||||||
|
# Merge consecutive spans that have the same styling
|
||||||
|
current_run = []
|
||||||
|
for span in root.xpath('//span'):
|
||||||
|
if not current_run:
|
||||||
|
current_run.append(span)
|
||||||
|
else:
|
||||||
|
last = current_run[-1]
|
||||||
|
if mergeable(last, span):
|
||||||
|
current_run.append(span)
|
||||||
|
else:
|
||||||
|
if len(current_run) > 1:
|
||||||
|
merge_run(current_run)
|
||||||
|
current_run = [span]
|
||||||
|
|
||||||
|
# Process dir attributes
|
||||||
|
class_map = dict(itervalues(styles.classes))
|
||||||
|
parents = ('p', 'div') + tuple('h%d' % i for i in range(1, 7))
|
||||||
|
for parent in root.xpath('//*[(%s)]' % ' or '.join('name()="%s"' % t for t in parents)):
|
||||||
|
# Ensure that children of rtl parents that are not rtl have an
|
||||||
|
# explicit dir set. Also, remove dir from children if it is the same as
|
||||||
|
# that of the parent.
|
||||||
|
if len(parent):
|
||||||
|
parent_dir = parent.get('dir')
|
||||||
|
for child in parent.iterchildren('span'):
|
||||||
|
child_dir = child.get('dir')
|
||||||
|
if parent_dir == 'rtl' and child_dir != 'rtl':
|
||||||
|
child_dir = 'ltr'
|
||||||
|
child.set('dir', child_dir)
|
||||||
|
if child_dir and child_dir == parent_dir:
|
||||||
|
child.attrib.pop('dir')
|
||||||
|
|
||||||
|
# Remove unnecessary span tags that are the only child of a parent block
|
||||||
|
# element
|
||||||
|
for parent in root.xpath('//*[(%s) and count(span)=1]' % ' or '.join('name()="%s"' % t for t in parents)):
|
||||||
|
if len(parent) == 1 and not parent.text and not parent[0].tail and not parent[0].get('id', None):
|
||||||
|
# We have a block whose contents are entirely enclosed in a <span>
|
||||||
|
span = parent[0]
|
||||||
|
span_class = span.get('class', None)
|
||||||
|
span_css = class_map.get(span_class, {})
|
||||||
|
span_dir = span.get('dir')
|
||||||
|
if liftable(span_css) and (not span_dir or span_dir == parent.get('dir')):
|
||||||
|
pclass = parent.get('class', None)
|
||||||
|
if span_class:
|
||||||
|
pclass = (pclass + ' ' + span_class) if pclass else span_class
|
||||||
|
parent.set('class', pclass)
|
||||||
|
parent.text = span.text
|
||||||
|
parent.remove(span)
|
||||||
|
if span.get('lang'):
|
||||||
|
parent.set('lang', span.get('lang'))
|
||||||
|
if span.get('dir'):
|
||||||
|
parent.set('dir', span.get('dir'))
|
||||||
|
for child in span:
|
||||||
|
parent.append(child)
|
||||||
|
|
||||||
|
# Make spans whose only styling is bold or italic into <b> and <i> tags
|
||||||
|
for span in root.xpath('//span[@class and not(@style)]'):
|
||||||
|
css = class_map.get(span.get('class', None), {})
|
||||||
|
if len(css) == 1:
|
||||||
|
if css == {'font-style':'italic'}:
|
||||||
|
span.tag = 'i'
|
||||||
|
del span.attrib['class']
|
||||||
|
elif css == {'font-weight':'bold'}:
|
||||||
|
span.tag = 'b'
|
||||||
|
del span.attrib['class']
|
||||||
|
|
||||||
|
# Get rid of <span>s that have no styling
|
||||||
|
for span in root.xpath('//span[not(@class or @id or @style or @lang or @dir)]'):
|
||||||
|
lift(span)
|
||||||
|
|
||||||
|
# Convert <p><br style="page-break-after:always"> </p> style page breaks
|
||||||
|
# into something the viewer will render as a page break
|
||||||
|
for p in root.xpath('//p[br[@style="page-break-after:always"]]'):
|
||||||
|
if len(p) == 1 and (not p[0].tail or not p[0].tail.strip()):
|
||||||
|
p.remove(p[0])
|
||||||
|
prefix = p.get('style', '')
|
||||||
|
if prefix:
|
||||||
|
prefix += '; '
|
||||||
|
p.set('style', prefix + 'page-break-after:always')
|
||||||
|
p.text = NBSP if not p.text else p.text
|
||||||
|
|
||||||
|
if detect_cover:
|
||||||
|
# Check if the first image in the document is possibly a cover
|
||||||
|
img = root.xpath('//img[@src][1]')
|
||||||
|
if img:
|
||||||
|
img = img[0]
|
||||||
|
path = os.path.join(dest_dir, img.get('src'))
|
||||||
|
if os.path.exists(path) and before_count(root, img, limit=10) < 5:
|
||||||
|
from calibre.utils.imghdr import identify
|
||||||
|
try:
|
||||||
|
with lopen(path, 'rb') as imf:
|
||||||
|
fmt, width, height = identify(imf)
|
||||||
|
except:
|
||||||
|
width, height, fmt = 0, 0, None # noqa
|
||||||
|
del fmt
|
||||||
|
try:
|
||||||
|
is_cover = 0.8 <= height/width <= 1.8 and height*width >= 160000
|
||||||
|
except ZeroDivisionError:
|
||||||
|
is_cover = False
|
||||||
|
if is_cover:
|
||||||
|
log.debug('Detected an image that looks like a cover')
|
||||||
|
img.getparent().remove(img)
|
||||||
|
return path
|
||||||
268
ebook_converter/ebooks/docx/container.py
Normal file
268
ebook_converter/ebooks/docx/container.py
Normal file
@@ -0,0 +1,268 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os, sys, shutil
|
||||||
|
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
from calibre import walk, guess_type
|
||||||
|
from calibre.ebooks.metadata import string_to_authors, authors_to_sort_string
|
||||||
|
from calibre.ebooks.metadata.book.base import Metadata
|
||||||
|
from calibre.ebooks.docx import InvalidDOCX
|
||||||
|
from calibre.ebooks.docx.names import DOCXNamespace
|
||||||
|
from calibre.ptempfile import PersistentTemporaryDirectory
|
||||||
|
from calibre.utils.localization import canonicalize_lang
|
||||||
|
from calibre.utils.logging import default_log
|
||||||
|
from calibre.utils.zipfile import ZipFile
|
||||||
|
from calibre.utils.xml_parse import safe_xml_fromstring
|
||||||
|
|
||||||
|
|
||||||
|
def fromstring(raw, parser=None):
|
||||||
|
return safe_xml_fromstring(raw)
|
||||||
|
|
||||||
|
# Read metadata {{{
|
||||||
|
|
||||||
|
|
||||||
|
def read_doc_props(raw, mi, XPath):
|
||||||
|
root = fromstring(raw)
|
||||||
|
titles = XPath('//dc:title')(root)
|
||||||
|
if titles:
|
||||||
|
title = titles[0].text
|
||||||
|
if title and title.strip():
|
||||||
|
mi.title = title.strip()
|
||||||
|
tags = []
|
||||||
|
for subject in XPath('//dc:subject')(root):
|
||||||
|
if subject.text and subject.text.strip():
|
||||||
|
tags.append(subject.text.strip().replace(',', '_'))
|
||||||
|
for keywords in XPath('//cp:keywords')(root):
|
||||||
|
if keywords.text and keywords.text.strip():
|
||||||
|
for x in keywords.text.split():
|
||||||
|
tags.extend(y.strip() for y in x.split(',') if y.strip())
|
||||||
|
if tags:
|
||||||
|
mi.tags = tags
|
||||||
|
authors = XPath('//dc:creator')(root)
|
||||||
|
aut = []
|
||||||
|
for author in authors:
|
||||||
|
if author.text and author.text.strip():
|
||||||
|
aut.extend(string_to_authors(author.text))
|
||||||
|
if aut:
|
||||||
|
mi.authors = aut
|
||||||
|
mi.author_sort = authors_to_sort_string(aut)
|
||||||
|
|
||||||
|
desc = XPath('//dc:description')(root)
|
||||||
|
if desc:
|
||||||
|
raw = etree.tostring(desc[0], method='text', encoding='unicode')
|
||||||
|
raw = raw.replace('_x000d_', '') # Word 2007 mangles newlines in the summary
|
||||||
|
mi.comments = raw.strip()
|
||||||
|
|
||||||
|
langs = []
|
||||||
|
for lang in XPath('//dc:language')(root):
|
||||||
|
if lang.text and lang.text.strip():
|
||||||
|
l = canonicalize_lang(lang.text)
|
||||||
|
if l:
|
||||||
|
langs.append(l)
|
||||||
|
if langs:
|
||||||
|
mi.languages = langs
|
||||||
|
|
||||||
|
|
||||||
|
def read_app_props(raw, mi):
|
||||||
|
root = fromstring(raw)
|
||||||
|
company = root.xpath('//*[local-name()="Company"]')
|
||||||
|
if company and company[0].text and company[0].text.strip():
|
||||||
|
mi.publisher = company[0].text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def read_default_style_language(raw, mi, XPath):
|
||||||
|
root = fromstring(raw)
|
||||||
|
for lang in XPath('/w:styles/w:docDefaults/w:rPrDefault/w:rPr/w:lang/@w:val')(root):
|
||||||
|
lang = canonicalize_lang(lang)
|
||||||
|
if lang:
|
||||||
|
mi.languages = [lang]
|
||||||
|
break
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
class DOCX(object):
|
||||||
|
|
||||||
|
def __init__(self, path_or_stream, log=None, extract=True):
|
||||||
|
self.docx_is_transitional = True
|
||||||
|
stream = path_or_stream if hasattr(path_or_stream, 'read') else open(path_or_stream, 'rb')
|
||||||
|
self.name = getattr(stream, 'name', None) or '<stream>'
|
||||||
|
self.log = log or default_log
|
||||||
|
if extract:
|
||||||
|
self.extract(stream)
|
||||||
|
else:
|
||||||
|
self.init_zipfile(stream)
|
||||||
|
self.read_content_types()
|
||||||
|
self.read_package_relationships()
|
||||||
|
self.namespace = DOCXNamespace(self.docx_is_transitional)
|
||||||
|
|
||||||
|
def init_zipfile(self, stream):
|
||||||
|
self.zipf = ZipFile(stream)
|
||||||
|
self.names = frozenset(self.zipf.namelist())
|
||||||
|
|
||||||
|
def extract(self, stream):
|
||||||
|
self.tdir = PersistentTemporaryDirectory('docx_container')
|
||||||
|
try:
|
||||||
|
zf = ZipFile(stream)
|
||||||
|
zf.extractall(self.tdir)
|
||||||
|
except:
|
||||||
|
self.log.exception('DOCX appears to be invalid ZIP file, trying a'
|
||||||
|
' more forgiving ZIP parser')
|
||||||
|
from calibre.utils.localunzip import extractall
|
||||||
|
stream.seek(0)
|
||||||
|
extractall(stream, self.tdir)
|
||||||
|
|
||||||
|
self.names = {}
|
||||||
|
for f in walk(self.tdir):
|
||||||
|
name = os.path.relpath(f, self.tdir).replace(os.sep, '/')
|
||||||
|
self.names[name] = f
|
||||||
|
|
||||||
|
def exists(self, name):
|
||||||
|
return name in self.names
|
||||||
|
|
||||||
|
def read(self, name):
|
||||||
|
if hasattr(self, 'zipf'):
|
||||||
|
return self.zipf.open(name).read()
|
||||||
|
path = self.names[name]
|
||||||
|
with open(path, 'rb') as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
def read_content_types(self):
|
||||||
|
try:
|
||||||
|
raw = self.read('[Content_Types].xml')
|
||||||
|
except KeyError:
|
||||||
|
raise InvalidDOCX('The file %s docx file has no [Content_Types].xml' % self.name)
|
||||||
|
root = fromstring(raw)
|
||||||
|
self.content_types = {}
|
||||||
|
self.default_content_types = {}
|
||||||
|
for item in root.xpath('//*[local-name()="Types"]/*[local-name()="Default" and @Extension and @ContentType]'):
|
||||||
|
self.default_content_types[item.get('Extension').lower()] = item.get('ContentType')
|
||||||
|
for item in root.xpath('//*[local-name()="Types"]/*[local-name()="Override" and @PartName and @ContentType]'):
|
||||||
|
name = item.get('PartName').lstrip('/')
|
||||||
|
self.content_types[name] = item.get('ContentType')
|
||||||
|
|
||||||
|
def content_type(self, name):
|
||||||
|
if name in self.content_types:
|
||||||
|
return self.content_types[name]
|
||||||
|
ext = name.rpartition('.')[-1].lower()
|
||||||
|
if ext in self.default_content_types:
|
||||||
|
return self.default_content_types[ext]
|
||||||
|
return guess_type(name)[0]
|
||||||
|
|
||||||
|
def read_package_relationships(self):
|
||||||
|
try:
|
||||||
|
raw = self.read('_rels/.rels')
|
||||||
|
except KeyError:
|
||||||
|
raise InvalidDOCX('The file %s docx file has no _rels/.rels' % self.name)
|
||||||
|
root = fromstring(raw)
|
||||||
|
self.relationships = {}
|
||||||
|
self.relationships_rmap = {}
|
||||||
|
for item in root.xpath('//*[local-name()="Relationships"]/*[local-name()="Relationship" and @Type and @Target]'):
|
||||||
|
target = item.get('Target').lstrip('/')
|
||||||
|
typ = item.get('Type')
|
||||||
|
if target == 'word/document.xml':
|
||||||
|
self.docx_is_transitional = typ != 'http://purl.oclc.org/ooxml/officeDocument/relationships/officeDocument'
|
||||||
|
self.relationships[typ] = target
|
||||||
|
self.relationships_rmap[target] = typ
|
||||||
|
|
||||||
|
@property
|
||||||
|
def document_name(self):
|
||||||
|
name = self.relationships.get(self.namespace.names['DOCUMENT'], None)
|
||||||
|
if name is None:
|
||||||
|
names = tuple(n for n in self.names if n == 'document.xml' or n.endswith('/document.xml'))
|
||||||
|
if not names:
|
||||||
|
raise InvalidDOCX('The file %s docx file has no main document' % self.name)
|
||||||
|
name = names[0]
|
||||||
|
return name
|
||||||
|
|
||||||
|
@property
|
||||||
|
def document(self):
|
||||||
|
return fromstring(self.read(self.document_name))
|
||||||
|
|
||||||
|
@property
|
||||||
|
def document_relationships(self):
|
||||||
|
return self.get_relationships(self.document_name)
|
||||||
|
|
||||||
|
def get_relationships(self, name):
|
||||||
|
base = '/'.join(name.split('/')[:-1])
|
||||||
|
by_id, by_type = {}, {}
|
||||||
|
parts = name.split('/')
|
||||||
|
name = '/'.join(parts[:-1] + ['_rels', parts[-1] + '.rels'])
|
||||||
|
try:
|
||||||
|
raw = self.read(name)
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
root = fromstring(raw)
|
||||||
|
for item in root.xpath('//*[local-name()="Relationships"]/*[local-name()="Relationship" and @Type and @Target]'):
|
||||||
|
target = item.get('Target')
|
||||||
|
if item.get('TargetMode', None) != 'External' and not target.startswith('#'):
|
||||||
|
target = '/'.join((base, target.lstrip('/')))
|
||||||
|
typ = item.get('Type')
|
||||||
|
Id = item.get('Id')
|
||||||
|
by_id[Id] = by_type[typ] = target
|
||||||
|
|
||||||
|
return by_id, by_type
|
||||||
|
|
||||||
|
def get_document_properties_names(self):
|
||||||
|
name = self.relationships.get(self.namespace.names['DOCPROPS'], None)
|
||||||
|
if name is None:
|
||||||
|
names = tuple(n for n in self.names if n.lower() == 'docprops/core.xml')
|
||||||
|
if names:
|
||||||
|
name = names[0]
|
||||||
|
yield name
|
||||||
|
name = self.relationships.get(self.namespace.names['APPPROPS'], None)
|
||||||
|
if name is None:
|
||||||
|
names = tuple(n for n in self.names if n.lower() == 'docprops/app.xml')
|
||||||
|
if names:
|
||||||
|
name = names[0]
|
||||||
|
yield name
|
||||||
|
|
||||||
|
@property
|
||||||
|
def metadata(self):
|
||||||
|
mi = Metadata(_('Unknown'))
|
||||||
|
dp_name, ap_name = self.get_document_properties_names()
|
||||||
|
if dp_name:
|
||||||
|
try:
|
||||||
|
raw = self.read(dp_name)
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
read_doc_props(raw, mi, self.namespace.XPath)
|
||||||
|
if mi.is_null('language'):
|
||||||
|
try:
|
||||||
|
raw = self.read('word/styles.xml')
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
read_default_style_language(raw, mi, self.namespace.XPath)
|
||||||
|
|
||||||
|
ap_name = self.relationships.get(self.namespace.names['APPPROPS'], None)
|
||||||
|
if ap_name:
|
||||||
|
try:
|
||||||
|
raw = self.read(ap_name)
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
read_app_props(raw, mi)
|
||||||
|
|
||||||
|
return mi
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
if hasattr(self, 'zipf'):
|
||||||
|
self.zipf.close()
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
shutil.rmtree(self.tdir)
|
||||||
|
except EnvironmentError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
d = DOCX(sys.argv[-1], extract=False)
|
||||||
|
print(d.metadata)
|
||||||
276
ebook_converter/ebooks/docx/fields.py
Normal file
276
ebook_converter/ebooks/docx/fields.py
Normal file
@@ -0,0 +1,276 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
from calibre.ebooks.docx.index import process_index, polish_index_markup
|
||||||
|
from polyglot.builtins import iteritems, native_string_type
|
||||||
|
|
||||||
|
|
||||||
|
class Field(object):
|
||||||
|
|
||||||
|
def __init__(self, start):
|
||||||
|
self.start = start
|
||||||
|
self.end = None
|
||||||
|
self.contents = []
|
||||||
|
self.buf = []
|
||||||
|
self.instructions = None
|
||||||
|
self.name = None
|
||||||
|
|
||||||
|
def add_instr(self, elem):
|
||||||
|
self.add_raw(elem.text)
|
||||||
|
|
||||||
|
def add_raw(self, raw):
|
||||||
|
if not raw:
|
||||||
|
return
|
||||||
|
if self.name is None:
|
||||||
|
# There are cases where partial index entries end with
|
||||||
|
# a significant space, along the lines of
|
||||||
|
# <>Summary <> ... <>Hearing<>.
|
||||||
|
# No known examples of starting with a space yet.
|
||||||
|
# self.name, raw = raw.strip().partition(' ')[0::2]
|
||||||
|
self.name, raw = raw.lstrip().partition(' ')[0::2]
|
||||||
|
self.buf.append(raw)
|
||||||
|
|
||||||
|
def finalize(self):
|
||||||
|
self.instructions = ''.join(self.buf)
|
||||||
|
del self.buf
|
||||||
|
|
||||||
|
|
||||||
|
WORD, FLAG = 0, 1
|
||||||
|
scanner = re.Scanner([
|
||||||
|
(r'\\\S{1}', lambda s, t: (t, FLAG)), # A flag of the form \x
|
||||||
|
(r'"[^"]*"', lambda s, t: (t[1:-1], WORD)), # Quoted word
|
||||||
|
(r'[^\s\\"]\S*', lambda s, t: (t, WORD)), # A non-quoted word, must not start with a backslash or a space or a quote
|
||||||
|
(r'\s+', None),
|
||||||
|
], flags=re.DOTALL)
|
||||||
|
|
||||||
|
null = object()
|
||||||
|
|
||||||
|
|
||||||
|
def parser(name, field_map, default_field_name=None):
|
||||||
|
|
||||||
|
field_map = dict((x.split(':') for x in field_map.split()))
|
||||||
|
|
||||||
|
def parse(raw, log=None):
|
||||||
|
ans = {}
|
||||||
|
last_option = None
|
||||||
|
raw = raw.replace('\\\\', '\x01').replace('\\"', '\x02')
|
||||||
|
for token, token_type in scanner.scan(raw)[0]:
|
||||||
|
token = token.replace('\x01', '\\').replace('\x02', '"')
|
||||||
|
if token_type is FLAG:
|
||||||
|
last_option = field_map.get(token[1], null)
|
||||||
|
if last_option is not None:
|
||||||
|
ans[last_option] = None
|
||||||
|
elif token_type is WORD:
|
||||||
|
if last_option is None:
|
||||||
|
ans[default_field_name] = token
|
||||||
|
else:
|
||||||
|
ans[last_option] = token
|
||||||
|
last_option = None
|
||||||
|
ans.pop(null, None)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
parse.__name__ = native_string_type('parse_' + name)
|
||||||
|
|
||||||
|
return parse
|
||||||
|
|
||||||
|
|
||||||
|
parse_hyperlink = parser('hyperlink',
|
||||||
|
'l:anchor m:image-map n:target o:title t:target', 'url')
|
||||||
|
|
||||||
|
parse_xe = parser('xe',
|
||||||
|
'b:bold i:italic f:entry-type r:page-range-bookmark t:page-number-text y:yomi', 'text')
|
||||||
|
|
||||||
|
parse_index = parser('index',
|
||||||
|
'b:bookmark c:columns-per-page d:sequence-separator e:first-page-number-separator'
|
||||||
|
' f:entry-type g:page-range-separator h:heading k:crossref-separator'
|
||||||
|
' l:page-number-separator p:letter-range s:sequence-name r:run-together y:yomi z:langcode')
|
||||||
|
|
||||||
|
parse_ref = parser('ref',
|
||||||
|
'd:separator f:footnote h:hyperlink n:number p:position r:relative-number t:suppress w:number-full-context')
|
||||||
|
|
||||||
|
parse_noteref = parser('noteref',
|
||||||
|
'f:footnote h:hyperlink p:position')
|
||||||
|
|
||||||
|
|
||||||
|
class Fields(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.fields = []
|
||||||
|
self.index_bookmark_counter = 0
|
||||||
|
self.index_bookmark_prefix = 'index-'
|
||||||
|
|
||||||
|
def __call__(self, doc, log):
|
||||||
|
all_ids = frozenset(self.namespace.XPath('//*/@w:id')(doc))
|
||||||
|
c = 0
|
||||||
|
while self.index_bookmark_prefix in all_ids:
|
||||||
|
c += 1
|
||||||
|
self.index_bookmark_prefix = self.index_bookmark_prefix.replace('-', '%d-' % c)
|
||||||
|
stack = []
|
||||||
|
for elem in self.namespace.XPath(
|
||||||
|
'//*[name()="w:p" or name()="w:r" or'
|
||||||
|
' name()="w:instrText" or'
|
||||||
|
' (name()="w:fldChar" and (@w:fldCharType="begin" or @w:fldCharType="end") or'
|
||||||
|
' name()="w:fldSimple")]')(doc):
|
||||||
|
if elem.tag.endswith('}fldChar'):
|
||||||
|
typ = self.namespace.get(elem, 'w:fldCharType')
|
||||||
|
if typ == 'begin':
|
||||||
|
stack.append(Field(elem))
|
||||||
|
self.fields.append(stack[-1])
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
stack.pop().end = elem
|
||||||
|
except IndexError:
|
||||||
|
pass
|
||||||
|
elif elem.tag.endswith('}instrText'):
|
||||||
|
if stack:
|
||||||
|
stack[-1].add_instr(elem)
|
||||||
|
elif elem.tag.endswith('}fldSimple'):
|
||||||
|
field = Field(elem)
|
||||||
|
instr = self.namespace.get(elem, 'w:instr')
|
||||||
|
if instr:
|
||||||
|
field.add_raw(instr)
|
||||||
|
self.fields.append(field)
|
||||||
|
for r in self.namespace.XPath('descendant::w:r')(elem):
|
||||||
|
field.contents.append(r)
|
||||||
|
else:
|
||||||
|
if stack:
|
||||||
|
stack[-1].contents.append(elem)
|
||||||
|
|
||||||
|
field_types = ('hyperlink', 'xe', 'index', 'ref', 'noteref')
|
||||||
|
parsers = {x.upper():getattr(self, 'parse_'+x) for x in field_types}
|
||||||
|
parsers.update({x:getattr(self, 'parse_'+x) for x in field_types})
|
||||||
|
field_parsers = {f.upper():globals()['parse_%s' % f] for f in field_types}
|
||||||
|
field_parsers.update({f:globals()['parse_%s' % f] for f in field_types})
|
||||||
|
|
||||||
|
for f in field_types:
|
||||||
|
setattr(self, '%s_fields' % f, [])
|
||||||
|
unknown_fields = {'TOC', 'toc', 'PAGEREF', 'pageref'} # The TOC and PAGEREF fields are handled separately
|
||||||
|
|
||||||
|
for field in self.fields:
|
||||||
|
field.finalize()
|
||||||
|
if field.instructions:
|
||||||
|
func = parsers.get(field.name, None)
|
||||||
|
if func is not None:
|
||||||
|
func(field, field_parsers[field.name], log)
|
||||||
|
elif field.name not in unknown_fields:
|
||||||
|
log.warn('Encountered unknown field: %s, ignoring it.' % field.name)
|
||||||
|
unknown_fields.add(field.name)
|
||||||
|
|
||||||
|
def get_runs(self, field):
|
||||||
|
all_runs = []
|
||||||
|
current_runs = []
|
||||||
|
# We only handle spans in a single paragraph
|
||||||
|
# being wrapped in <a>
|
||||||
|
for x in field.contents:
|
||||||
|
if x.tag.endswith('}p'):
|
||||||
|
if current_runs:
|
||||||
|
all_runs.append(current_runs)
|
||||||
|
current_runs = []
|
||||||
|
elif x.tag.endswith('}r'):
|
||||||
|
current_runs.append(x)
|
||||||
|
if current_runs:
|
||||||
|
all_runs.append(current_runs)
|
||||||
|
return all_runs
|
||||||
|
|
||||||
|
def parse_hyperlink(self, field, parse_func, log):
|
||||||
|
# Parse hyperlink fields
|
||||||
|
hl = parse_func(field.instructions, log)
|
||||||
|
if hl:
|
||||||
|
if 'target' in hl and hl['target'] is None:
|
||||||
|
hl['target'] = '_blank'
|
||||||
|
for runs in self.get_runs(field):
|
||||||
|
self.hyperlink_fields.append((hl, runs))
|
||||||
|
|
||||||
|
def parse_ref(self, field, parse_func, log):
|
||||||
|
ref = parse_func(field.instructions, log)
|
||||||
|
dest = ref.get(None, None)
|
||||||
|
if dest is not None and 'hyperlink' in ref:
|
||||||
|
for runs in self.get_runs(field):
|
||||||
|
self.hyperlink_fields.append(({'anchor':dest}, runs))
|
||||||
|
else:
|
||||||
|
log.warn('Unsupported reference field (%s), ignoring: %r' % (field.name, ref))
|
||||||
|
|
||||||
|
parse_noteref = parse_ref
|
||||||
|
|
||||||
|
def parse_xe(self, field, parse_func, log):
|
||||||
|
# Parse XE fields
|
||||||
|
if None in (field.start, field.end):
|
||||||
|
return
|
||||||
|
xe = parse_func(field.instructions, log)
|
||||||
|
if xe:
|
||||||
|
# We insert a synthetic bookmark around this index item so that we
|
||||||
|
# can link to it later
|
||||||
|
def WORD(x):
|
||||||
|
return self.namespace.expand('w:' + x)
|
||||||
|
self.index_bookmark_counter += 1
|
||||||
|
bmark = xe['anchor'] = '%s%d' % (self.index_bookmark_prefix, self.index_bookmark_counter)
|
||||||
|
p = field.start.getparent()
|
||||||
|
bm = p.makeelement(WORD('bookmarkStart'))
|
||||||
|
bm.set(WORD('id'), bmark), bm.set(WORD('name'), bmark)
|
||||||
|
p.insert(p.index(field.start), bm)
|
||||||
|
p = field.end.getparent()
|
||||||
|
bm = p.makeelement(WORD('bookmarkEnd'))
|
||||||
|
bm.set(WORD('id'), bmark)
|
||||||
|
p.insert(p.index(field.end) + 1, bm)
|
||||||
|
xe['start_elem'] = field.start
|
||||||
|
self.xe_fields.append(xe)
|
||||||
|
|
||||||
|
def parse_index(self, field, parse_func, log):
|
||||||
|
if not field.contents:
|
||||||
|
return
|
||||||
|
idx = parse_func(field.instructions, log)
|
||||||
|
hyperlinks, blocks = process_index(field, idx, self.xe_fields, log, self.namespace.XPath, self.namespace.expand)
|
||||||
|
if not blocks:
|
||||||
|
return
|
||||||
|
for anchor, run in hyperlinks:
|
||||||
|
self.hyperlink_fields.append(({'anchor':anchor}, [run]))
|
||||||
|
|
||||||
|
self.index_fields.append((idx, blocks))
|
||||||
|
|
||||||
|
def polish_markup(self, object_map):
|
||||||
|
if not self.index_fields:
|
||||||
|
return
|
||||||
|
rmap = {v:k for k, v in iteritems(object_map)}
|
||||||
|
for idx, blocks in self.index_fields:
|
||||||
|
polish_index_markup(idx, [rmap[b] for b in blocks])
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_fields(return_tests=False):
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
class TestParseFields(unittest.TestCase):
|
||||||
|
|
||||||
|
def test_hyperlink(self):
|
||||||
|
ae = lambda x, y: self.assertEqual(parse_hyperlink(x, None), y)
|
||||||
|
ae(r'\l anchor1', {'anchor':'anchor1'})
|
||||||
|
ae(r'www.calibre-ebook.com', {'url':'www.calibre-ebook.com'})
|
||||||
|
ae(r'www.calibre-ebook.com \t target \o tt', {'url':'www.calibre-ebook.com', 'target':'target', 'title': 'tt'})
|
||||||
|
ae(r'"c:\\Some Folder"', {'url': 'c:\\Some Folder'})
|
||||||
|
ae(r'xxxx \y yyyy', {'url': 'xxxx'})
|
||||||
|
|
||||||
|
def test_xe(self):
|
||||||
|
ae = lambda x, y: self.assertEqual(parse_xe(x, None), y)
|
||||||
|
ae(r'"some name"', {'text':'some name'})
|
||||||
|
ae(r'name \b \i', {'text':'name', 'bold':None, 'italic':None})
|
||||||
|
ae(r'xxx \y a', {'text':'xxx', 'yomi':'a'})
|
||||||
|
|
||||||
|
def test_index(self):
|
||||||
|
ae = lambda x, y: self.assertEqual(parse_index(x, None), y)
|
||||||
|
ae(r'', {})
|
||||||
|
ae(r'\b \c 1', {'bookmark':None, 'columns-per-page': '1'})
|
||||||
|
|
||||||
|
suite = unittest.TestLoader().loadTestsFromTestCase(TestParseFields)
|
||||||
|
if return_tests:
|
||||||
|
return suite
|
||||||
|
unittest.TextTestRunner(verbosity=4).run(suite)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
test_parse_fields()
|
||||||
197
ebook_converter/ebooks/docx/fonts.py
Normal file
197
ebook_converter/ebooks/docx/fonts.py
Normal file
@@ -0,0 +1,197 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os, re
|
||||||
|
from collections import namedtuple
|
||||||
|
|
||||||
|
from calibre.ebooks.docx.block_styles import binary_property, inherit
|
||||||
|
from calibre.utils.filenames import ascii_filename
|
||||||
|
from calibre.utils.fonts.scanner import font_scanner, NoFonts
|
||||||
|
from calibre.utils.fonts.utils import panose_to_css_generic_family, is_truetype_font
|
||||||
|
from calibre.utils.icu import ord_string
|
||||||
|
from polyglot.builtins import codepoint_to_chr, iteritems, range
|
||||||
|
|
||||||
|
Embed = namedtuple('Embed', 'name key subsetted')
|
||||||
|
|
||||||
|
|
||||||
|
def has_system_fonts(name):
|
||||||
|
try:
|
||||||
|
return bool(font_scanner.fonts_for_family(name))
|
||||||
|
except NoFonts:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def get_variant(bold=False, italic=False):
|
||||||
|
return {(False, False):'Regular', (False, True):'Italic',
|
||||||
|
(True, False):'Bold', (True, True):'BoldItalic'}[(bold, italic)]
|
||||||
|
|
||||||
|
|
||||||
|
def find_fonts_matching(fonts, style='normal', stretch='normal'):
|
||||||
|
for font in fonts:
|
||||||
|
if font['font-style'] == style and font['font-stretch'] == stretch:
|
||||||
|
yield font
|
||||||
|
|
||||||
|
|
||||||
|
def weight_key(font):
|
||||||
|
w = font['font-weight']
|
||||||
|
try:
|
||||||
|
return abs(int(w) - 400)
|
||||||
|
except Exception:
|
||||||
|
return abs({'normal': 400, 'bold': 700}.get(w, 1000000) - 400)
|
||||||
|
|
||||||
|
|
||||||
|
def get_best_font(fonts, style, stretch):
|
||||||
|
try:
|
||||||
|
return sorted(find_fonts_matching(fonts, style, stretch), key=weight_key)[0]
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class Family(object):
|
||||||
|
|
||||||
|
def __init__(self, elem, embed_relationships, XPath, get):
|
||||||
|
self.name = self.family_name = get(elem, 'w:name')
|
||||||
|
self.alt_names = tuple(get(x, 'w:val') for x in XPath('./w:altName')(elem))
|
||||||
|
if self.alt_names and not has_system_fonts(self.name):
|
||||||
|
for x in self.alt_names:
|
||||||
|
if has_system_fonts(x):
|
||||||
|
self.family_name = x
|
||||||
|
break
|
||||||
|
|
||||||
|
self.embedded = {}
|
||||||
|
for x in ('Regular', 'Bold', 'Italic', 'BoldItalic'):
|
||||||
|
for y in XPath('./w:embed%s[@r:id]' % x)(elem):
|
||||||
|
rid = get(y, 'r:id')
|
||||||
|
key = get(y, 'w:fontKey')
|
||||||
|
subsetted = get(y, 'w:subsetted') in {'1', 'true', 'on'}
|
||||||
|
if rid in embed_relationships:
|
||||||
|
self.embedded[x] = Embed(embed_relationships[rid], key, subsetted)
|
||||||
|
|
||||||
|
self.generic_family = 'auto'
|
||||||
|
for x in XPath('./w:family[@w:val]')(elem):
|
||||||
|
self.generic_family = get(x, 'w:val', 'auto')
|
||||||
|
|
||||||
|
ntt = binary_property(elem, 'notTrueType', XPath, get)
|
||||||
|
self.is_ttf = ntt is inherit or not ntt
|
||||||
|
|
||||||
|
self.panose1 = None
|
||||||
|
self.panose_name = None
|
||||||
|
for x in XPath('./w:panose1[@w:val]')(elem):
|
||||||
|
try:
|
||||||
|
v = get(x, 'w:val')
|
||||||
|
v = tuple(int(v[i:i+2], 16) for i in range(0, len(v), 2))
|
||||||
|
except (TypeError, ValueError, IndexError):
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
self.panose1 = v
|
||||||
|
self.panose_name = panose_to_css_generic_family(v)
|
||||||
|
|
||||||
|
self.css_generic_family = {'roman':'serif', 'swiss':'sans-serif', 'modern':'monospace',
|
||||||
|
'decorative':'fantasy', 'script':'cursive'}.get(self.generic_family, None)
|
||||||
|
self.css_generic_family = self.css_generic_family or self.panose_name or 'serif'
|
||||||
|
|
||||||
|
|
||||||
|
SYMBOL_MAPS = { # {{{
|
||||||
|
'Wingdings': (' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '🖉', '✂', '✁', '👓', '🕭', '🕮', '🕯', '🕿', '✆', '🖂', '🖃', '📪', '📫', '📬', '📭', '🗀', '🗁', '🗎', '🗏', '🗐', '🗄', '⏳', '🖮', '🖰', '🖲', '🖳', '🖴', '🖫', '🖬', '✇', '✍', '🖎', '✌', '🖏', '👍', '👎', '☜', '☞', '☜', '🖗', '🖐', '☺', '😐', '☹', '💣', '🕱', '🏳', '🏱', '✈', '☼', '🌢', '❄', '🕆', '✞', '🕈', '✠', '✡', '☪', '☯', '🕉', '☸', '♈', '♉', '♊', '♋', '♌', '♍', '♎', '♏', '♐', '♑', '♒', '♓', '🙰', '🙵', '⚫', '🔾', '◼', '🞏', '🞐', '❑', '❒', '🞟', '⧫', '◆', '❖', '🞙', '⌧', '⮹', '⌘', '🏵', '🏶', '🙶', '🙷', ' ', '🄋', '➀', '➁', '➂', '➃', '➄', '➅', '➆', '➇', '➈', '➉', '🄌', '➊', '➋', '➌', '➍', '➎', '➏', '➐', '➑', '➒', '➓', '🙢', '🙠', '🙡', '🙣', '🙦', '🙤', '🙥', '🙧', '∙', '•', '⬝', '⭘', '🞆', '🞈', '🞊', '🞋', '🔿', '▪', '🞎', '🟀', '🟁', '★', '🟋', '🟏', '🟓', '🟑', '⯐', '⌖', '⯎', '⯏', '⯑', '✪', '✰', '🕐', '🕑', '🕒', '🕓', '🕔', '🕕', '🕖', '🕗', '🕘', '🕙', '🕚', '🕛', '⮰', '⮱', '⮲', '⮳', '⮴', '⮵', '⮶', '⮷', '🙪', '🙫', '🙕', '🙔', '🙗', '🙖', '🙐', '🙑', '🙒', '🙓', '⌫', '⌦', '⮘', '⮚', '⮙', '⮛', '⮈', '⮊', '⮉', '⮋', '🡨', '🡪', '🡩', '🡫', '🡬', '🡭', '🡯', '🡮', '🡸', '🡺', '🡹', '🡻', '🡼', '🡽', '🡿', '🡾', '⇦', '⇨', '⇧', '⇩', '⬄', '⇳', '⬁', '⬀', '⬃', '⬂', '🢬', '🢭', '🗶', '✓', '🗷', '🗹', ' '), # noqa
|
||||||
|
|
||||||
|
'Wingdings 2': (' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '🖊', '🖋', '🖌', '🖍', '✄', '✀', '🕾', '🕽', '🗅', '🗆', '🗇', '🗈', '🗉', '🗊', '🗋', '🗌', '🗍', '📋', '🗑', '🗔', '🖵', '🖶', '🖷', '🖸', '🖭', '🖯', '🖱', '🖒', '🖓', '🖘', '🖙', '🖚', '🖛', '👈', '👉', '🖜', '🖝', '🖞', '🖟', '🖠', '🖡', '👆', '👇', '🖢', '🖣', '🖑', '🗴', '🗸', '🗵', '☑', '⮽', '☒', '⮾', '⮿', '🛇', '⦸', '🙱', '🙴', '🙲', '🙳', '‽', '🙹', '🙺', '🙻', '🙦', '🙤', '🙥', '🙧', '🙚', '🙘', '🙙', '🙛', '⓪', '①', '②', '③', '④', '⑤', '⑥', '⑦', '⑧', '⑨', '⑩', '⓿', '❶', '❷', '❸', '❹', '❺', '❻', '❼', '❽', '❾', '❿', ' ', '☉', '🌕', '☽', '☾', '⸿', '✝', '🕇', '🕜', '🕝', '🕞', '🕟', '🕠', '🕡', '🕢', '🕣', '🕤', '🕥', '🕦', '🕧', '🙨', '🙩', '⋅', '🞄', '⦁', '●', '●', '🞅', '🞇', '🞉', '⊙', '⦿', '🞌', '🞍', '◾', '■', '□', '🞑', '🞒', '🞓', '🞔', '▣', '🞕', '🞖', '🞗', '🞘', '⬩', '⬥', '◇', '🞚', '◈', '🞛', '🞜', '🞝', '🞞', '⬪', '⬧', '◊', '🞠', '◖', '◗', '⯊', '⯋', '⯀', '⯁', '⬟', '⯂', '⬣', '⬢', '⯃', '⯄', '🞡', '🞢', '🞣', '🞤', '🞥', '🞦', '🞧', '🞨', '🞩', '🞪', '🞫', '🞬', '🞭', '🞮', '🞯', '🞰', '🞱', '🞲', '🞳', '🞴', '🞵', '🞶', '🞷', '🞸', '🞹', '🞺', '🞻', '🞼', '🞽', '🞾', '🞿', '🟀', '🟂', '🟄', '🟆', '🟉', '🟊', '✶', '🟌', '🟎', '🟐', '🟒', '✹', '🟃', '🟇', '✯', '🟍', '🟔', '⯌', '⯍', '※', '⁂', ' ', ' ', ' ', ' ', ' ', ' ',), # noqa
|
||||||
|
|
||||||
|
'Wingdings 3': (' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '⭠', '⭢', '⭡', '⭣', '⭤', '⭥', '⭧', '⭦', '⭰', '⭲', '⭱', '⭳', '⭶', '⭸', '⭻', '⭽', '⭤', '⭥', '⭪', '⭬', '⭫', '⭭', '⭍', '⮠', '⮡', '⮢', '⮣', '⮤', '⮥', '⮦', '⮧', '⮐', '⮑', '⮒', '⮓', '⮀', '⮃', '⭾', '⭿', '⮄', '⮆', '⮅', '⮇', '⮏', '⮍', '⮎', '⮌', '⭮', '⭯', '⎋', '⌤', '⌃', '⌥', '␣', '⍽', '⇪', '⮸', '🢠', '🢡', '🢢', '🢣', '🢤', '🢥', '🢦', '🢧', '🢨', '🢩', '🢪', '🢫', '🡐', '🡒', '🡑', '🡓', '🡔', '🡕', '🡗', '🡖', '🡘', '🡙', '▲', '▼', '△', '▽', '◀', '▶', '◁', '▷', '◣', '◢', '◤', '◥', '🞀', '🞂', '🞁', ' ', '🞃', '⯅', '⯆', '⯇', '⯈', '⮜', '⮞', '⮝', '⮟', '🠐', '🠒', '🠑', '🠓', '🠔', '🠖', '🠕', '🠗', '🠘', '🠚', '🠙', '🠛', '🠜', '🠞', '🠝', '🠟', '🠀', '🠂', '🠁', '🠃', '🠄', '🠆', '🠅', '🠇', '🠈', '🠊', '🠉', '🠋', '🠠', '🠢', '🠤', '🠦', '🠨', '🠪', '🠬', '🢜', '🢝', '🢞', '🢟', '🠮', '🠰', '🠲', '🠴', '🠶', '🠸', '🠺', '🠹', '🠻', '🢘', '🢚', '🢙', '🢛', '🠼', '🠾', '🠽', '🠿', '🡀', '🡂', '🡁', '🡃', '🡄', '🡆', '🡅', '🡇', '⮨', '⮩', '⮪', '⮫', '⮬', '⮭', '⮮', '⮯', '🡠', '🡢', '🡡', '🡣', '🡤', '🡥', '🡧', '🡦', '🡰', '🡲', '🡱', '🡳', '🡴', '🡵', '🡷', '🡶', '🢀', '🢂', '🢁', '🢃', '🢄', '🢅', '🢇', '🢆', '🢐', '🢒', '🢑', '🢓', '🢔', '🢕', '🢗', '🢖', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',), # noqa
|
||||||
|
|
||||||
|
'Webdings': (' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '🕷', '🕸', '🕲', '🕶', '🏆', '🎖', '🖇', '🗨', '🗩', '🗰', '🗱', '🌶', '🎗', '🙾', '🙼', '🗕', '🗖', '🗗', '⏴', '⏵', '⏶', '⏷', '⏪', '⏩', '⏮', '⏭', '⏸', '⏹', '⏺', '🗚', '🗳', '🛠', '🏗', '🏘', '🏙', '🏚', '🏜', '🏭', '🏛', '🏠', '🏖', '🏝', '🛣', '🔍', '🏔', '👁', '👂', '🏞', '🏕', '🛤', '🏟', '🛳', '🕬', '🕫', '🕨', '🔈', '🎔', '🎕', '🗬', '🙽', '🗭', '🗪', '🗫', '⮔', '✔', '🚲', '⬜', '🛡', '📦', '🛱', '⬛', '🚑', '🛈', '🛩', '🛰', '🟈', '🕴', '⬤', '🛥', '🚔', '🗘', '🗙', '❓', '🛲', '🚇', '🚍', '⛳', '⦸', '⊖', '🚭', '🗮', '⏐', '🗯', '🗲', ' ', '🚹', '🚺', '🛉', '🛊', '🚼', '👽', '🏋', '⛷', '🏂', '🏌', '🏊', '🏄', '🏍', '🏎', '🚘', '🗠', '🛢', '📠', '🏷', '📣', '👪', '🗡', '🗢', '🗣', '✯', '🖄', '🖅', '🖃', '🖆', '🖹', '🖺', '🖻', '🕵', '🕰', '🖽', '🖾', '📋', '🗒', '🗓', '🕮', '📚', '🗞', '🗟', '🗃', '🗂', '🖼', '🎭', '🎜', '🎘', '🎙', '🎧', '💿', '🎞', '📷', '🎟', '🎬', '📽', '📹', '📾', '📻', '🎚', '🎛', '📺', '💻', '🖥', '🖦', '🖧', '🍹', '🎮', '🎮', '🕻', '🕼', '🖁', '🖀', '🖨', '🖩', '🖿', '🖪', '🗜', '🔒', '🔓', '🗝', '📥', '📤', '🕳', '🌣', '🌤', '🌥', '🌦', '☁', '🌨', '🌧', '🌩', '🌪', '🌬', '🌫', '🌜', '🌡', '🛋', '🛏', '🍽', '🍸', '🛎', '🛍', 'Ⓟ', '♿', '🛆', '🖈', '🎓', '🗤', '🗥', '🗦', '🗧', '🛪', '🐿', '🐦', '🐟', '🐕', '🐈', '🙬', '🙮', '🙭', '🙯', '🗺', '🌍', '🌏', '🌎', '🕊',), # noqa
|
||||||
|
|
||||||
|
'Symbol': (' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '!', '∀', '#', '∃', '%', '&', '∍', '(', ')', '*', '+', ',', '−', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '≅', 'Α', 'Β', 'Χ', 'Δ', 'Ε', 'Φ', 'Γ', 'Η', 'Ι', 'ϑ', 'Λ', 'Μ', 'Ν', 'Ξ', 'Ο', 'Π', 'Θ', 'Ρ', 'Σ', 'Τ', 'Υ', 'ς', 'Ω', 'Ξ', 'Ψ', 'Ζ', '[', '∴', ']', '⊥', '_', '', 'α', 'β', 'χ', 'δ', 'ε', 'φ', 'γ', 'η', 'ι', 'ϕ', 'λ', 'μ', 'ν', 'ξ', 'ο', 'π', 'θ', 'ρ', 'σ', 'τ', 'υ', 'ϖ', 'ω', 'ξ', 'ψ', 'ζ', '{', '|', '}', '~', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '€', 'ϒ', '′', '≤', '⁄', '∞', 'ƒ', '♣', '♥', '♦', '♠', '↔', '←', '↑', '→', '↓', '°', '±', '″', '≥', '×', '∝', '∂', '•', '÷', '≠', '≡', '≈', '…', '⏐', '⎯', '↲', 'ℵ', 'ℑ', 'ℜ', '℘', '⊗', '⊕', '∅', '∩', '∪', '⊃', '⊇', '⊄', '⊂', '⊆', '∈', '∉', '∠', '∂', '®', '©', '™', '∏', '√', '⋅', '¬', '∦', '∧', '⇔', '⇐', '⇑', '⇒', '⇓', '◊', '〈', '®', '©', '™', '∑', '⎛', '⎜', '⎝', '⎡', '⎢', '⎣', '⎧', '⎨', '⎩', '⎪', ' ', '〉', '∫', '⌠', '⎮', '⌡', '⎞', '⎟', '⎠', '⎤', '⎥', '⎦', '⎪', '⎫', '⎬', ' ',), # noqa
|
||||||
|
} # }}}
|
||||||
|
|
||||||
|
SYMBOL_FONT_NAMES = frozenset(n.lower() for n in SYMBOL_MAPS)
|
||||||
|
|
||||||
|
|
||||||
|
def is_symbol_font(family):
|
||||||
|
try:
|
||||||
|
return family.lower() in SYMBOL_FONT_NAMES
|
||||||
|
except AttributeError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def do_map(m, points):
|
||||||
|
base = 0xf000
|
||||||
|
limit = len(m) + base
|
||||||
|
for p in points:
|
||||||
|
if base < p < limit:
|
||||||
|
yield m[p - base]
|
||||||
|
else:
|
||||||
|
yield codepoint_to_chr(p)
|
||||||
|
|
||||||
|
|
||||||
|
def map_symbol_text(text, font):
|
||||||
|
m = SYMBOL_MAPS[font]
|
||||||
|
if isinstance(text, bytes):
|
||||||
|
text = text.decode('utf-8')
|
||||||
|
return ''.join(do_map(m, ord_string(text)))
|
||||||
|
|
||||||
|
|
||||||
|
class Fonts(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.fonts = {}
|
||||||
|
self.used = set()
|
||||||
|
|
||||||
|
def __call__(self, root, embed_relationships, docx, dest_dir):
|
||||||
|
for elem in self.namespace.XPath('//w:font[@w:name]')(root):
|
||||||
|
self.fonts[self.namespace.get(elem, 'w:name')] = Family(elem, embed_relationships, self.namespace.XPath, self.namespace.get)
|
||||||
|
|
||||||
|
def family_for(self, name, bold=False, italic=False):
|
||||||
|
f = self.fonts.get(name, None)
|
||||||
|
if f is None:
|
||||||
|
return 'serif'
|
||||||
|
variant = get_variant(bold, italic)
|
||||||
|
self.used.add((name, variant))
|
||||||
|
name = f.name if variant in f.embedded else f.family_name
|
||||||
|
if is_symbol_font(name):
|
||||||
|
return name
|
||||||
|
return '"%s", %s' % (name.replace('"', ''), f.css_generic_family)
|
||||||
|
|
||||||
|
def embed_fonts(self, dest_dir, docx):
|
||||||
|
defs = []
|
||||||
|
dest_dir = os.path.join(dest_dir, 'fonts')
|
||||||
|
for name, variant in self.used:
|
||||||
|
f = self.fonts[name]
|
||||||
|
if variant in f.embedded:
|
||||||
|
if not os.path.exists(dest_dir):
|
||||||
|
os.mkdir(dest_dir)
|
||||||
|
fname = self.write(name, dest_dir, docx, variant)
|
||||||
|
if fname is not None:
|
||||||
|
d = {'font-family':'"%s"' % name.replace('"', ''), 'src': 'url("fonts/%s")' % fname}
|
||||||
|
if 'Bold' in variant:
|
||||||
|
d['font-weight'] = 'bold'
|
||||||
|
if 'Italic' in variant:
|
||||||
|
d['font-style'] = 'italic'
|
||||||
|
d = ['%s: %s' % (k, v) for k, v in iteritems(d)]
|
||||||
|
d = ';\n\t'.join(d)
|
||||||
|
defs.append('@font-face {\n\t%s\n}\n' % d)
|
||||||
|
return '\n'.join(defs)
|
||||||
|
|
||||||
|
def write(self, name, dest_dir, docx, variant):
|
||||||
|
f = self.fonts[name]
|
||||||
|
ef = f.embedded[variant]
|
||||||
|
raw = docx.read(ef.name)
|
||||||
|
prefix = raw[:32]
|
||||||
|
if ef.key:
|
||||||
|
key = re.sub(r'[^A-Fa-f0-9]', '', ef.key)
|
||||||
|
key = bytearray(reversed(tuple(int(key[i:i+2], 16) for i in range(0, len(key), 2))))
|
||||||
|
prefix = bytearray(prefix)
|
||||||
|
prefix = bytes(bytearray(prefix[i]^key[i % len(key)] for i in range(len(prefix))))
|
||||||
|
if not is_truetype_font(prefix):
|
||||||
|
return None
|
||||||
|
ext = 'otf' if prefix.startswith(b'OTTO') else 'ttf'
|
||||||
|
fname = ascii_filename('%s - %s.%s' % (name, variant, ext))
|
||||||
|
with open(os.path.join(dest_dir, fname), 'wb') as dest:
|
||||||
|
dest.write(prefix)
|
||||||
|
dest.write(raw[32:])
|
||||||
|
|
||||||
|
return fname
|
||||||
65
ebook_converter/ebooks/docx/footnotes.py
Normal file
65
ebook_converter/ebooks/docx/footnotes.py
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from collections import OrderedDict
|
||||||
|
from polyglot.builtins import iteritems, unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class Note(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace, parent, rels):
|
||||||
|
self.type = namespace.get(parent, 'w:type', 'normal')
|
||||||
|
self.parent = parent
|
||||||
|
self.rels = rels
|
||||||
|
self.namespace = namespace
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
for p in self.namespace.descendants(self.parent, 'w:p', 'w:tbl'):
|
||||||
|
yield p
|
||||||
|
|
||||||
|
|
||||||
|
class Footnotes(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.footnotes = {}
|
||||||
|
self.endnotes = {}
|
||||||
|
self.counter = 0
|
||||||
|
self.notes = OrderedDict()
|
||||||
|
|
||||||
|
def __call__(self, footnotes, footnotes_rels, endnotes, endnotes_rels):
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
if footnotes is not None:
|
||||||
|
for footnote in XPath('./w:footnote[@w:id]')(footnotes):
|
||||||
|
fid = get(footnote, 'w:id')
|
||||||
|
if fid:
|
||||||
|
self.footnotes[fid] = Note(self.namespace, footnote, footnotes_rels)
|
||||||
|
|
||||||
|
if endnotes is not None:
|
||||||
|
for endnote in XPath('./w:endnote[@w:id]')(endnotes):
|
||||||
|
fid = get(endnote, 'w:id')
|
||||||
|
if fid:
|
||||||
|
self.endnotes[fid] = Note(self.namespace, endnote, endnotes_rels)
|
||||||
|
|
||||||
|
def get_ref(self, ref):
|
||||||
|
fid = self.namespace.get(ref, 'w:id')
|
||||||
|
notes = self.footnotes if ref.tag.endswith('}footnoteReference') else self.endnotes
|
||||||
|
note = notes.get(fid, None)
|
||||||
|
if note is not None and note.type == 'normal':
|
||||||
|
self.counter += 1
|
||||||
|
anchor = 'note_%d' % self.counter
|
||||||
|
self.notes[anchor] = (unicode_type(self.counter), note)
|
||||||
|
return anchor, unicode_type(self.counter)
|
||||||
|
return None, None
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
for anchor, (counter, note) in iteritems(self.notes):
|
||||||
|
yield anchor, counter, note
|
||||||
|
|
||||||
|
@property
|
||||||
|
def has_notes(self):
|
||||||
|
return bool(self.notes)
|
||||||
343
ebook_converter/ebooks/docx/images.py
Normal file
343
ebook_converter/ebooks/docx/images.py
Normal file
@@ -0,0 +1,343 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from lxml.html.builder import IMG, HR
|
||||||
|
|
||||||
|
from calibre.constants import iswindows
|
||||||
|
from calibre.ebooks.docx.names import barename
|
||||||
|
from calibre.utils.filenames import ascii_filename
|
||||||
|
from calibre.utils.img import resize_to_fit, image_to_data
|
||||||
|
from calibre.utils.imghdr import what
|
||||||
|
from polyglot.builtins import iteritems, itervalues
|
||||||
|
|
||||||
|
|
||||||
|
class LinkedImageNotFound(ValueError):
|
||||||
|
|
||||||
|
def __init__(self, fname):
|
||||||
|
ValueError.__init__(self, fname)
|
||||||
|
self.fname = fname
|
||||||
|
|
||||||
|
|
||||||
|
def image_filename(x):
|
||||||
|
return ascii_filename(x).replace(' ', '_').replace('#', '_')
|
||||||
|
|
||||||
|
|
||||||
|
def emu_to_pt(x):
|
||||||
|
return x / 12700
|
||||||
|
|
||||||
|
|
||||||
|
def pt_to_emu(x):
|
||||||
|
return int(x * 12700)
|
||||||
|
|
||||||
|
|
||||||
|
def get_image_properties(parent, XPath, get):
|
||||||
|
width = height = None
|
||||||
|
for extent in XPath('./wp:extent')(parent):
|
||||||
|
try:
|
||||||
|
width = emu_to_pt(int(extent.get('cx')))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
try:
|
||||||
|
height = emu_to_pt(int(extent.get('cy')))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
ans = {}
|
||||||
|
if width is not None:
|
||||||
|
ans['width'] = '%.3gpt' % width
|
||||||
|
if height is not None:
|
||||||
|
ans['height'] = '%.3gpt' % height
|
||||||
|
|
||||||
|
alt = None
|
||||||
|
title = None
|
||||||
|
for docPr in XPath('./wp:docPr')(parent):
|
||||||
|
alt = docPr.get('descr') or alt
|
||||||
|
title = docPr.get('title') or title
|
||||||
|
if docPr.get('hidden', None) in {'true', 'on', '1'}:
|
||||||
|
ans['display'] = 'none'
|
||||||
|
|
||||||
|
return ans, alt, title
|
||||||
|
|
||||||
|
|
||||||
|
def get_image_margins(elem):
|
||||||
|
ans = {}
|
||||||
|
for w, css in iteritems({'L':'left', 'T':'top', 'R':'right', 'B':'bottom'}):
|
||||||
|
val = elem.get('dist%s' % w, None)
|
||||||
|
if val is not None:
|
||||||
|
try:
|
||||||
|
val = emu_to_pt(val)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
ans['padding-%s' % css] = '%.3gpt' % val
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def get_hpos(anchor, page_width, XPath, get, width_frac):
|
||||||
|
for ph in XPath('./wp:positionH')(anchor):
|
||||||
|
rp = ph.get('relativeFrom', None)
|
||||||
|
if rp == 'leftMargin':
|
||||||
|
return 0 + width_frac
|
||||||
|
if rp == 'rightMargin':
|
||||||
|
return 1 + width_frac
|
||||||
|
al = None
|
||||||
|
almap = {'left':0, 'center':0.5, 'right':1}
|
||||||
|
for align in XPath('./wp:align')(ph):
|
||||||
|
al = almap.get(align.text)
|
||||||
|
if al is not None:
|
||||||
|
if rp == 'page':
|
||||||
|
return al
|
||||||
|
return al + width_frac
|
||||||
|
for po in XPath('./wp:posOffset')(ph):
|
||||||
|
try:
|
||||||
|
pos = emu_to_pt(int(po.text))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
return pos/page_width + width_frac
|
||||||
|
|
||||||
|
for sp in XPath('./wp:simplePos')(anchor):
|
||||||
|
try:
|
||||||
|
x = emu_to_pt(sp.get('x', None))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
return x/page_width + width_frac
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
class Images(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace, log):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.rid_map = {}
|
||||||
|
self.used = {}
|
||||||
|
self.resized = {}
|
||||||
|
self.names = set()
|
||||||
|
self.all_images = set()
|
||||||
|
self.links = []
|
||||||
|
self.log = log
|
||||||
|
|
||||||
|
def __call__(self, relationships_by_id):
|
||||||
|
self.rid_map = relationships_by_id
|
||||||
|
|
||||||
|
def read_image_data(self, fname, base=None):
|
||||||
|
if fname.startswith('file://'):
|
||||||
|
src = fname[len('file://'):]
|
||||||
|
if iswindows and src and src[0] == '/':
|
||||||
|
src = src[1:]
|
||||||
|
if not src or not os.path.exists(src):
|
||||||
|
raise LinkedImageNotFound(src)
|
||||||
|
with open(src, 'rb') as rawsrc:
|
||||||
|
raw = rawsrc.read()
|
||||||
|
else:
|
||||||
|
try:
|
||||||
|
raw = self.docx.read(fname)
|
||||||
|
except KeyError:
|
||||||
|
raise LinkedImageNotFound(fname)
|
||||||
|
base = base or image_filename(fname.rpartition('/')[-1]) or 'image'
|
||||||
|
ext = what(None, raw) or base.rpartition('.')[-1] or 'jpeg'
|
||||||
|
if ext == 'emf':
|
||||||
|
# For an example, see: https://bugs.launchpad.net/bugs/1224849
|
||||||
|
self.log('Found an EMF image: %s, trying to extract embedded raster image' % fname)
|
||||||
|
from calibre.utils.wmf.emf import emf_unwrap
|
||||||
|
try:
|
||||||
|
raw = emf_unwrap(raw)
|
||||||
|
except Exception:
|
||||||
|
self.log.exception('Failed to extract embedded raster image from EMF')
|
||||||
|
else:
|
||||||
|
ext = 'png'
|
||||||
|
base = base.rpartition('.')[0]
|
||||||
|
if not base:
|
||||||
|
base = 'image'
|
||||||
|
base += '.' + ext
|
||||||
|
return raw, base
|
||||||
|
|
||||||
|
def unique_name(self, base):
|
||||||
|
exists = frozenset(itervalues(self.used))
|
||||||
|
c = 1
|
||||||
|
name = base
|
||||||
|
while name in exists:
|
||||||
|
n, e = base.rpartition('.')[0::2]
|
||||||
|
name = '%s-%d.%s' % (n, c, e)
|
||||||
|
c += 1
|
||||||
|
return name
|
||||||
|
|
||||||
|
def resize_image(self, raw, base, max_width, max_height):
|
||||||
|
resized, img = resize_to_fit(raw, max_width, max_height)
|
||||||
|
if resized:
|
||||||
|
base, ext = os.path.splitext(base)
|
||||||
|
base = base + '-%dx%d%s' % (max_width, max_height, ext)
|
||||||
|
raw = image_to_data(img, fmt=ext[1:])
|
||||||
|
return raw, base, resized
|
||||||
|
|
||||||
|
def generate_filename(self, rid, base=None, rid_map=None, max_width=None, max_height=None):
|
||||||
|
rid_map = self.rid_map if rid_map is None else rid_map
|
||||||
|
fname = rid_map[rid]
|
||||||
|
key = (fname, max_width, max_height)
|
||||||
|
ans = self.used.get(key)
|
||||||
|
if ans is not None:
|
||||||
|
return ans
|
||||||
|
raw, base = self.read_image_data(fname, base=base)
|
||||||
|
resized = False
|
||||||
|
if max_width is not None and max_height is not None:
|
||||||
|
raw, base, resized = self.resize_image(raw, base, max_width, max_height)
|
||||||
|
name = self.unique_name(base)
|
||||||
|
self.used[key] = name
|
||||||
|
if max_width is not None and max_height is not None and not resized:
|
||||||
|
okey = (fname, None, None)
|
||||||
|
if okey in self.used:
|
||||||
|
return self.used[okey]
|
||||||
|
self.used[okey] = name
|
||||||
|
with open(os.path.join(self.dest_dir, name), 'wb') as f:
|
||||||
|
f.write(raw)
|
||||||
|
self.all_images.add('images/' + name)
|
||||||
|
return name
|
||||||
|
|
||||||
|
def pic_to_img(self, pic, alt, parent, title):
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
name = None
|
||||||
|
link = None
|
||||||
|
for hl in XPath('descendant::a:hlinkClick[@r:id]')(parent):
|
||||||
|
link = {'id':get(hl, 'r:id')}
|
||||||
|
tgt = hl.get('tgtFrame', None)
|
||||||
|
if tgt:
|
||||||
|
link['target'] = tgt
|
||||||
|
title = hl.get('tooltip', None)
|
||||||
|
if title:
|
||||||
|
link['title'] = title
|
||||||
|
|
||||||
|
for pr in XPath('descendant::pic:cNvPr')(pic):
|
||||||
|
name = pr.get('name', None)
|
||||||
|
if name:
|
||||||
|
name = image_filename(name)
|
||||||
|
alt = pr.get('descr') or alt
|
||||||
|
for a in XPath('descendant::a:blip[@r:embed or @r:link]')(pic):
|
||||||
|
rid = get(a, 'r:embed')
|
||||||
|
if not rid:
|
||||||
|
rid = get(a, 'r:link')
|
||||||
|
if rid and rid in self.rid_map:
|
||||||
|
try:
|
||||||
|
src = self.generate_filename(rid, name)
|
||||||
|
except LinkedImageNotFound as err:
|
||||||
|
self.log.warn('Linked image: %s not found, ignoring' % err.fname)
|
||||||
|
continue
|
||||||
|
img = IMG(src='images/%s' % src)
|
||||||
|
img.set('alt', alt or 'Image')
|
||||||
|
if title:
|
||||||
|
img.set('title', title)
|
||||||
|
if link is not None:
|
||||||
|
self.links.append((img, link, self.rid_map))
|
||||||
|
return img
|
||||||
|
|
||||||
|
def drawing_to_html(self, drawing, page):
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
# First process the inline pictures
|
||||||
|
for inline in XPath('./wp:inline')(drawing):
|
||||||
|
style, alt, title = get_image_properties(inline, XPath, get)
|
||||||
|
for pic in XPath('descendant::pic:pic')(inline):
|
||||||
|
ans = self.pic_to_img(pic, alt, inline, title)
|
||||||
|
if ans is not None:
|
||||||
|
if style:
|
||||||
|
ans.set('style', '; '.join('%s: %s' % (k, v) for k, v in iteritems(style)))
|
||||||
|
yield ans
|
||||||
|
|
||||||
|
# Now process the floats
|
||||||
|
for anchor in XPath('./wp:anchor')(drawing):
|
||||||
|
style, alt, title = get_image_properties(anchor, XPath, get)
|
||||||
|
self.get_float_properties(anchor, style, page)
|
||||||
|
for pic in XPath('descendant::pic:pic')(anchor):
|
||||||
|
ans = self.pic_to_img(pic, alt, anchor, title)
|
||||||
|
if ans is not None:
|
||||||
|
if style:
|
||||||
|
ans.set('style', '; '.join('%s: %s' % (k, v) for k, v in iteritems(style)))
|
||||||
|
yield ans
|
||||||
|
|
||||||
|
def pict_to_html(self, pict, page):
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
# First see if we have an <hr>
|
||||||
|
is_hr = len(pict) == 1 and get(pict[0], 'o:hr') in {'t', 'true'}
|
||||||
|
if is_hr:
|
||||||
|
style = {}
|
||||||
|
hr = HR()
|
||||||
|
try:
|
||||||
|
pct = float(get(pict[0], 'o:hrpct'))
|
||||||
|
except (ValueError, TypeError, AttributeError):
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
if pct > 0:
|
||||||
|
style['width'] = '%.3g%%' % pct
|
||||||
|
align = get(pict[0], 'o:hralign', 'center')
|
||||||
|
if align in {'left', 'right'}:
|
||||||
|
style['margin-left'] = '0' if align == 'left' else 'auto'
|
||||||
|
style['margin-right'] = 'auto' if align == 'left' else '0'
|
||||||
|
if style:
|
||||||
|
hr.set('style', '; '.join(('%s:%s' % (k, v) for k, v in iteritems(style))))
|
||||||
|
yield hr
|
||||||
|
|
||||||
|
for imagedata in XPath('descendant::v:imagedata[@r:id]')(pict):
|
||||||
|
rid = get(imagedata, 'r:id')
|
||||||
|
if rid in self.rid_map:
|
||||||
|
try:
|
||||||
|
src = self.generate_filename(rid)
|
||||||
|
except LinkedImageNotFound as err:
|
||||||
|
self.log.warn('Linked image: %s not found, ignoring' % err.fname)
|
||||||
|
continue
|
||||||
|
img = IMG(src='images/%s' % src, style="display:block")
|
||||||
|
alt = get(imagedata, 'o:title')
|
||||||
|
img.set('alt', alt or 'Image')
|
||||||
|
yield img
|
||||||
|
|
||||||
|
def get_float_properties(self, anchor, style, page):
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
if 'display' not in style:
|
||||||
|
style['display'] = 'block'
|
||||||
|
padding = get_image_margins(anchor)
|
||||||
|
width = float(style.get('width', '100pt')[:-2])
|
||||||
|
|
||||||
|
page_width = page.width - page.margin_left - page.margin_right
|
||||||
|
if page_width <= 0:
|
||||||
|
# Ignore margins
|
||||||
|
page_width = page.width
|
||||||
|
|
||||||
|
hpos = get_hpos(anchor, page_width, XPath, get, width/(2*page_width))
|
||||||
|
|
||||||
|
wrap_elem = None
|
||||||
|
dofloat = False
|
||||||
|
|
||||||
|
for child in reversed(anchor):
|
||||||
|
bt = barename(child.tag)
|
||||||
|
if bt in {'wrapNone', 'wrapSquare', 'wrapThrough', 'wrapTight', 'wrapTopAndBottom'}:
|
||||||
|
wrap_elem = child
|
||||||
|
dofloat = bt not in {'wrapNone', 'wrapTopAndBottom'}
|
||||||
|
break
|
||||||
|
|
||||||
|
if wrap_elem is not None:
|
||||||
|
padding.update(get_image_margins(wrap_elem))
|
||||||
|
wt = wrap_elem.get('wrapText', None)
|
||||||
|
hpos = 0 if wt == 'right' else 1 if wt == 'left' else hpos
|
||||||
|
if dofloat:
|
||||||
|
style['float'] = 'left' if hpos < 0.65 else 'right'
|
||||||
|
else:
|
||||||
|
ml, mr = (None, None) if hpos < 0.34 else ('auto', None) if hpos > 0.65 else ('auto', 'auto')
|
||||||
|
if ml is not None:
|
||||||
|
style['margin-left'] = ml
|
||||||
|
if mr is not None:
|
||||||
|
style['margin-right'] = mr
|
||||||
|
|
||||||
|
style.update(padding)
|
||||||
|
|
||||||
|
def to_html(self, elem, page, docx, dest_dir):
|
||||||
|
dest = os.path.join(dest_dir, 'images')
|
||||||
|
if not os.path.exists(dest):
|
||||||
|
os.mkdir(dest)
|
||||||
|
self.dest_dir, self.docx = dest, docx
|
||||||
|
if elem.tag.endswith('}drawing'):
|
||||||
|
for tag in self.drawing_to_html(elem, page):
|
||||||
|
yield tag
|
||||||
|
else:
|
||||||
|
for tag in self.pict_to_html(elem, page):
|
||||||
|
yield tag
|
||||||
273
ebook_converter/ebooks/docx/index.py
Normal file
273
ebook_converter/ebooks/docx/index.py
Normal file
@@ -0,0 +1,273 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2014, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from operator import itemgetter
|
||||||
|
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
from calibre.utils.icu import partition_by_first_letter, sort_key
|
||||||
|
from polyglot.builtins import iteritems, filter
|
||||||
|
|
||||||
|
|
||||||
|
def get_applicable_xe_fields(index, xe_fields, XPath, expand):
|
||||||
|
iet = index.get('entry-type', None)
|
||||||
|
xe_fields = [xe for xe in xe_fields if xe.get('entry-type', None) == iet]
|
||||||
|
|
||||||
|
lr = index.get('letter-range', None)
|
||||||
|
if lr is not None:
|
||||||
|
sl, el = lr.parition('-')[0::2]
|
||||||
|
sl, el = sl.strip(), el.strip()
|
||||||
|
if sl and el:
|
||||||
|
def inrange(text):
|
||||||
|
return sl <= text[0] <= el
|
||||||
|
xe_fields = [xe for xe in xe_fields if inrange(xe.get('text', ''))]
|
||||||
|
|
||||||
|
bmark = index.get('bookmark', None)
|
||||||
|
if bmark is None:
|
||||||
|
return xe_fields
|
||||||
|
attr = expand('w:name')
|
||||||
|
bookmarks = {b for b in XPath('//w:bookmarkStart')(xe_fields[0]['start_elem']) if b.get(attr, None) == bmark}
|
||||||
|
ancestors = XPath('ancestor::w:bookmarkStart')
|
||||||
|
|
||||||
|
def contained(xe):
|
||||||
|
# Check if the xe field is contained inside a bookmark with the
|
||||||
|
# specified name
|
||||||
|
return bool(set(ancestors(xe['start_elem'])) & bookmarks)
|
||||||
|
|
||||||
|
return [xe for xe in xe_fields if contained(xe)]
|
||||||
|
|
||||||
|
|
||||||
|
def make_block(expand, style, parent, pos):
|
||||||
|
p = parent.makeelement(expand('w:p'))
|
||||||
|
parent.insert(pos, p)
|
||||||
|
if style is not None:
|
||||||
|
ppr = p.makeelement(expand('w:pPr'))
|
||||||
|
p.append(ppr)
|
||||||
|
ps = ppr.makeelement(expand('w:pStyle'))
|
||||||
|
ppr.append(ps)
|
||||||
|
ps.set(expand('w:val'), style)
|
||||||
|
r = p.makeelement(expand('w:r'))
|
||||||
|
p.append(r)
|
||||||
|
t = r.makeelement(expand('w:t'))
|
||||||
|
t.set(expand('xml:space'), 'preserve')
|
||||||
|
r.append(t)
|
||||||
|
return p, t
|
||||||
|
|
||||||
|
|
||||||
|
def add_xe(xe, t, expand):
|
||||||
|
run = t.getparent()
|
||||||
|
idx = run.index(t)
|
||||||
|
t.text = xe.get('text') or ' '
|
||||||
|
pt = xe.get('page-number-text', None)
|
||||||
|
|
||||||
|
if pt:
|
||||||
|
p = t.getparent().getparent()
|
||||||
|
r = p.makeelement(expand('w:r'))
|
||||||
|
p.append(r)
|
||||||
|
t2 = r.makeelement(expand('w:t'))
|
||||||
|
t2.set(expand('xml:space'), 'preserve')
|
||||||
|
t2.text = ' [%s]' % pt
|
||||||
|
r.append(t2)
|
||||||
|
# put separate entries on separate lines
|
||||||
|
run.insert(idx + 1, run.makeelement(expand('w:br')))
|
||||||
|
return xe['anchor'], run
|
||||||
|
|
||||||
|
|
||||||
|
def process_index(field, index, xe_fields, log, XPath, expand):
|
||||||
|
'''
|
||||||
|
We remove all the word generated index markup and replace it with our own
|
||||||
|
that is more suitable for an ebook.
|
||||||
|
'''
|
||||||
|
styles = []
|
||||||
|
heading_text = index.get('heading', None)
|
||||||
|
heading_style = 'IndexHeading'
|
||||||
|
start_pos = None
|
||||||
|
for elem in field.contents:
|
||||||
|
if elem.tag.endswith('}p'):
|
||||||
|
s = XPath('descendant::pStyle/@w:val')(elem)
|
||||||
|
if s:
|
||||||
|
styles.append(s[0])
|
||||||
|
p = elem.getparent()
|
||||||
|
if start_pos is None:
|
||||||
|
start_pos = (p, p.index(elem))
|
||||||
|
p.remove(elem)
|
||||||
|
|
||||||
|
xe_fields = get_applicable_xe_fields(index, xe_fields, XPath, expand)
|
||||||
|
if not xe_fields:
|
||||||
|
return [], []
|
||||||
|
if heading_text is not None:
|
||||||
|
groups = partition_by_first_letter(xe_fields, key=itemgetter('text'))
|
||||||
|
items = []
|
||||||
|
for key, fields in iteritems(groups):
|
||||||
|
items.append(key), items.extend(fields)
|
||||||
|
if styles:
|
||||||
|
heading_style = styles[0]
|
||||||
|
else:
|
||||||
|
items = sorted(xe_fields, key=lambda x:sort_key(x['text']))
|
||||||
|
|
||||||
|
hyperlinks = []
|
||||||
|
blocks = []
|
||||||
|
for item in reversed(items):
|
||||||
|
is_heading = not isinstance(item, dict)
|
||||||
|
style = heading_style if is_heading else None
|
||||||
|
p, t = make_block(expand, style, *start_pos)
|
||||||
|
if is_heading:
|
||||||
|
text = heading_text
|
||||||
|
if text.lower().startswith('a'):
|
||||||
|
text = item + text[1:]
|
||||||
|
t.text = text
|
||||||
|
else:
|
||||||
|
hyperlinks.append(add_xe(item, t, expand))
|
||||||
|
blocks.append(p)
|
||||||
|
|
||||||
|
return hyperlinks, blocks
|
||||||
|
|
||||||
|
|
||||||
|
def split_up_block(block, a, text, parts, ldict):
|
||||||
|
prefix = parts[:-1]
|
||||||
|
a.text = parts[-1]
|
||||||
|
parent = a.getparent()
|
||||||
|
style = 'display:block; margin-left: %.3gem'
|
||||||
|
for i, prefix in enumerate(prefix):
|
||||||
|
m = 1.5 * i
|
||||||
|
span = parent.makeelement('span', style=style % m)
|
||||||
|
ldict[span] = i
|
||||||
|
parent.append(span)
|
||||||
|
span.text = prefix
|
||||||
|
span = parent.makeelement('span', style=style % ((i + 1) * 1.5))
|
||||||
|
parent.append(span)
|
||||||
|
span.append(a)
|
||||||
|
ldict[span] = len(prefix)
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
The merge algorithm is a little tricky.
|
||||||
|
We start with a list of elementary blocks. Each is an HtmlElement, a p node
|
||||||
|
with a list of child nodes. The last child may be a link, and the earlier ones are
|
||||||
|
just text.
|
||||||
|
The list is in reverse order from what we want in the index.
|
||||||
|
There is a dictionary ldict which records the level of each child node.
|
||||||
|
|
||||||
|
Now we want to do a reduce-like operation, combining all blocks with the same
|
||||||
|
top level index entry into a single block representing the structure of all
|
||||||
|
references, subentries, etc. under that top entry.
|
||||||
|
Here's the algorithm.
|
||||||
|
|
||||||
|
Given a block p and the next block n, and the top level entries p1 and n1 in each
|
||||||
|
block, which we assume have the same text:
|
||||||
|
|
||||||
|
Start with (p, p1) and (n, n1).
|
||||||
|
|
||||||
|
Given (p, p1, ..., pk) and (n, n1, ..., nk) which we want to merge:
|
||||||
|
|
||||||
|
If there are no more levels in n, and we have a link in nk,
|
||||||
|
then add the link from nk to the links for pk.
|
||||||
|
This might be the first link for pk, or we might get a list of references.
|
||||||
|
|
||||||
|
Otherwise nk+1 is the next level in n. Look for a matching entry in p. It must have
|
||||||
|
the same text, it must follow pk, it must come before we find any other p entries at
|
||||||
|
the same level as pk, and it must have the same level as nk+1.
|
||||||
|
|
||||||
|
If we find such a matching entry, go back to the start with (p ... pk+1) and (n ... nk+1).
|
||||||
|
|
||||||
|
If there is no matching entry, then because of the original reversed order we want
|
||||||
|
to insert nk+1 and all following entries from n into p immediately following pk.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def find_match(prev_block, pind, nextent, ldict):
|
||||||
|
curlevel = ldict.get(prev_block[pind], -1)
|
||||||
|
if curlevel < 0:
|
||||||
|
return -1
|
||||||
|
for p in range(pind+1, len(prev_block)):
|
||||||
|
trylev = ldict.get(prev_block[p], -1)
|
||||||
|
if trylev <= curlevel:
|
||||||
|
return -1
|
||||||
|
if trylev > (curlevel+1):
|
||||||
|
continue
|
||||||
|
if prev_block[p].text_content() == nextent.text_content():
|
||||||
|
return p
|
||||||
|
return -1
|
||||||
|
|
||||||
|
|
||||||
|
def add_link(pent, nent, ldict):
|
||||||
|
na = nent.xpath('descendant::a[1]')
|
||||||
|
# If there is no link, leave it as text
|
||||||
|
if not na or len(na) == 0:
|
||||||
|
return
|
||||||
|
na = na[0]
|
||||||
|
pa = pent.xpath('descendant::a')
|
||||||
|
if pa and len(pa) > 0:
|
||||||
|
# Put on same line with a comma
|
||||||
|
pa = pa[-1]
|
||||||
|
pa.tail = ', '
|
||||||
|
p = pa.getparent()
|
||||||
|
p.insert(p.index(pa) + 1, na)
|
||||||
|
else:
|
||||||
|
# substitute link na for plain text in pent
|
||||||
|
pent.text = ""
|
||||||
|
pent.append(na)
|
||||||
|
|
||||||
|
|
||||||
|
def merge_blocks(prev_block, next_block, pind, nind, next_path, ldict):
|
||||||
|
# First elements match. Any more in next?
|
||||||
|
if len(next_path) == (nind + 1):
|
||||||
|
nextent = next_block[nind]
|
||||||
|
add_link(prev_block[pind], nextent, ldict)
|
||||||
|
return
|
||||||
|
|
||||||
|
nind = nind + 1
|
||||||
|
nextent = next_block[nind]
|
||||||
|
prevent = find_match(prev_block, pind, nextent, ldict)
|
||||||
|
if prevent > 0:
|
||||||
|
merge_blocks(prev_block, next_block, prevent, nind, next_path, ldict)
|
||||||
|
return
|
||||||
|
|
||||||
|
# Want to insert elements into previous block
|
||||||
|
while nind < len(next_block):
|
||||||
|
# insert takes it out of old
|
||||||
|
pind = pind + 1
|
||||||
|
prev_block.insert(pind, next_block[nind])
|
||||||
|
|
||||||
|
next_block.getparent().remove(next_block)
|
||||||
|
|
||||||
|
|
||||||
|
def polish_index_markup(index, blocks):
|
||||||
|
# Blocks are in reverse order at this point
|
||||||
|
path_map = {}
|
||||||
|
ldict = {}
|
||||||
|
for block in blocks:
|
||||||
|
cls = block.get('class', '') or ''
|
||||||
|
block.set('class', (cls + ' index-entry').lstrip())
|
||||||
|
a = block.xpath('descendant::a[1]')
|
||||||
|
text = ''
|
||||||
|
if a:
|
||||||
|
text = etree.tostring(a[0], method='text', with_tail=False, encoding='unicode').strip()
|
||||||
|
if ':' in text:
|
||||||
|
path_map[block] = parts = list(filter(None, (x.strip() for x in text.split(':'))))
|
||||||
|
if len(parts) > 1:
|
||||||
|
split_up_block(block, a[0], text, parts, ldict)
|
||||||
|
else:
|
||||||
|
# try using a span all the time
|
||||||
|
path_map[block] = [text]
|
||||||
|
parent = a[0].getparent()
|
||||||
|
span = parent.makeelement('span', style='display:block; margin-left: 0em')
|
||||||
|
parent.append(span)
|
||||||
|
span.append(a[0])
|
||||||
|
ldict[span] = 0
|
||||||
|
|
||||||
|
for br in block.xpath('descendant::br'):
|
||||||
|
br.tail = None
|
||||||
|
|
||||||
|
# We want a single block for each main entry
|
||||||
|
prev_block = blocks[0]
|
||||||
|
for block in blocks[1:]:
|
||||||
|
pp, pn = path_map[prev_block], path_map[block]
|
||||||
|
if pp[0] == pn[0]:
|
||||||
|
merge_blocks(prev_block, block, 0, 0, pn, ldict)
|
||||||
|
else:
|
||||||
|
prev_block = block
|
||||||
144
ebook_converter/ebooks/docx/names.py
Normal file
144
ebook_converter/ebooks/docx/names.py
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
from lxml.etree import XPath as X
|
||||||
|
|
||||||
|
from calibre.utils.filenames import ascii_text
|
||||||
|
from polyglot.builtins import iteritems
|
||||||
|
|
||||||
|
# Names {{{
|
||||||
|
TRANSITIONAL_NAMES = {
|
||||||
|
'DOCUMENT' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument',
|
||||||
|
'DOCPROPS' : 'http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties',
|
||||||
|
'APPPROPS' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties',
|
||||||
|
'STYLES' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles',
|
||||||
|
'NUMBERING' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering',
|
||||||
|
'FONTS' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable',
|
||||||
|
'EMBEDDED_FONT' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/font',
|
||||||
|
'IMAGES' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/image',
|
||||||
|
'LINKS' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink',
|
||||||
|
'FOOTNOTES' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/footnotes',
|
||||||
|
'ENDNOTES' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/endnotes',
|
||||||
|
'THEMES' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme',
|
||||||
|
'SETTINGS' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings',
|
||||||
|
'WEB_SETTINGS' : 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings',
|
||||||
|
}
|
||||||
|
|
||||||
|
STRICT_NAMES = {
|
||||||
|
k:v.replace('http://schemas.openxmlformats.org/officeDocument/2006', 'http://purl.oclc.org/ooxml/officeDocument')
|
||||||
|
for k, v in iteritems(TRANSITIONAL_NAMES)
|
||||||
|
}
|
||||||
|
|
||||||
|
TRANSITIONAL_NAMESPACES = {
|
||||||
|
'mo': 'http://schemas.microsoft.com/office/mac/office/2008/main',
|
||||||
|
'o': 'urn:schemas-microsoft-com:office:office',
|
||||||
|
've': 'http://schemas.openxmlformats.org/markup-compatibility/2006',
|
||||||
|
'mc': 'http://schemas.openxmlformats.org/markup-compatibility/2006',
|
||||||
|
# Text Content
|
||||||
|
'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main',
|
||||||
|
'w10': 'urn:schemas-microsoft-com:office:word',
|
||||||
|
'wne': 'http://schemas.microsoft.com/office/word/2006/wordml',
|
||||||
|
'xml': 'http://www.w3.org/XML/1998/namespace',
|
||||||
|
# Drawing
|
||||||
|
'a': 'http://schemas.openxmlformats.org/drawingml/2006/main',
|
||||||
|
'm': 'http://schemas.openxmlformats.org/officeDocument/2006/math',
|
||||||
|
'mv': 'urn:schemas-microsoft-com:mac:vml',
|
||||||
|
'pic': 'http://schemas.openxmlformats.org/drawingml/2006/picture',
|
||||||
|
'v': 'urn:schemas-microsoft-com:vml',
|
||||||
|
'wp': 'http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing',
|
||||||
|
# Properties (core and extended)
|
||||||
|
'cp': 'http://schemas.openxmlformats.org/package/2006/metadata/core-properties',
|
||||||
|
'dc': 'http://purl.org/dc/elements/1.1/',
|
||||||
|
'ep': 'http://schemas.openxmlformats.org/officeDocument/2006/extended-properties',
|
||||||
|
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
|
||||||
|
# Content Types
|
||||||
|
'ct': 'http://schemas.openxmlformats.org/package/2006/content-types',
|
||||||
|
# Package Relationships
|
||||||
|
'r': 'http://schemas.openxmlformats.org/officeDocument/2006/relationships',
|
||||||
|
'pr': 'http://schemas.openxmlformats.org/package/2006/relationships',
|
||||||
|
# Dublin Core document properties
|
||||||
|
'dcmitype': 'http://purl.org/dc/dcmitype/',
|
||||||
|
'dcterms': 'http://purl.org/dc/terms/'
|
||||||
|
}
|
||||||
|
|
||||||
|
STRICT_NAMESPACES = {
|
||||||
|
k:v.replace(
|
||||||
|
'http://schemas.openxmlformats.org/officeDocument/2006', 'http://purl.oclc.org/ooxml/officeDocument').replace(
|
||||||
|
'http://schemas.openxmlformats.org/wordprocessingml/2006', 'http://purl.oclc.org/ooxml/wordprocessingml').replace(
|
||||||
|
'http://schemas.openxmlformats.org/drawingml/2006', 'http://purl.oclc.org/ooxml/drawingml')
|
||||||
|
for k, v in iteritems(TRANSITIONAL_NAMESPACES)
|
||||||
|
}
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
def barename(x):
|
||||||
|
return x.rpartition('}')[-1]
|
||||||
|
|
||||||
|
|
||||||
|
def XML(x):
|
||||||
|
return '{%s}%s' % (TRANSITIONAL_NAMESPACES['xml'], x)
|
||||||
|
|
||||||
|
|
||||||
|
def generate_anchor(name, existing):
|
||||||
|
x = y = 'id_' + re.sub(r'[^0-9a-zA-Z_]', '', ascii_text(name)).lstrip('_')
|
||||||
|
c = 1
|
||||||
|
while y in existing:
|
||||||
|
y = '%s_%d' % (x, c)
|
||||||
|
c += 1
|
||||||
|
return y
|
||||||
|
|
||||||
|
|
||||||
|
class DOCXNamespace(object):
|
||||||
|
|
||||||
|
def __init__(self, transitional=True):
|
||||||
|
self.xpath_cache = {}
|
||||||
|
if transitional:
|
||||||
|
self.namespaces = TRANSITIONAL_NAMESPACES.copy()
|
||||||
|
self.names = TRANSITIONAL_NAMES.copy()
|
||||||
|
else:
|
||||||
|
self.namespaces = STRICT_NAMESPACES.copy()
|
||||||
|
self.names = STRICT_NAMES.copy()
|
||||||
|
|
||||||
|
def XPath(self, expr):
|
||||||
|
ans = self.xpath_cache.get(expr, None)
|
||||||
|
if ans is None:
|
||||||
|
self.xpath_cache[expr] = ans = X(expr, namespaces=self.namespaces)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def is_tag(self, x, q):
|
||||||
|
tag = getattr(x, 'tag', x)
|
||||||
|
ns, name = q.partition(':')[0::2]
|
||||||
|
return '{%s}%s' % (self.namespaces.get(ns, None), name) == tag
|
||||||
|
|
||||||
|
def expand(self, name, sep=':'):
|
||||||
|
ns, tag = name.partition(sep)[::2]
|
||||||
|
if ns and tag:
|
||||||
|
tag = '{%s}%s' % (self.namespaces[ns], tag)
|
||||||
|
return tag or ns
|
||||||
|
|
||||||
|
def get(self, x, attr, default=None):
|
||||||
|
return x.attrib.get(self.expand(attr), default)
|
||||||
|
|
||||||
|
def ancestor(self, elem, name):
|
||||||
|
try:
|
||||||
|
return self.XPath('ancestor::%s[1]' % name)(elem)[0]
|
||||||
|
except IndexError:
|
||||||
|
return None
|
||||||
|
|
||||||
|
def children(self, elem, *args):
|
||||||
|
return self.XPath('|'.join('child::%s' % a for a in args))(elem)
|
||||||
|
|
||||||
|
def descendants(self, elem, *args):
|
||||||
|
return self.XPath('|'.join('descendant::%s' % a for a in args))(elem)
|
||||||
|
|
||||||
|
def makeelement(self, root, tag, append=True, **attrs):
|
||||||
|
ans = root.makeelement(self.expand(tag), **{self.expand(k, sep='_'):v for k, v in iteritems(attrs)})
|
||||||
|
if append:
|
||||||
|
root.append(ans)
|
||||||
|
return ans
|
||||||
388
ebook_converter/ebooks/docx/numbering.py
Normal file
388
ebook_converter/ebooks/docx/numbering.py
Normal file
@@ -0,0 +1,388 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import re, string
|
||||||
|
from collections import Counter, defaultdict
|
||||||
|
from functools import partial
|
||||||
|
|
||||||
|
from lxml.html.builder import OL, UL, SPAN
|
||||||
|
|
||||||
|
from calibre.ebooks.docx.block_styles import ParagraphStyle
|
||||||
|
from calibre.ebooks.docx.char_styles import RunStyle, inherit
|
||||||
|
from calibre.ebooks.metadata import roman
|
||||||
|
from polyglot.builtins import iteritems, unicode_type
|
||||||
|
|
||||||
|
STYLE_MAP = {
|
||||||
|
'aiueo': 'hiragana',
|
||||||
|
'aiueoFullWidth': 'hiragana',
|
||||||
|
'hebrew1': 'hebrew',
|
||||||
|
'iroha': 'katakana-iroha',
|
||||||
|
'irohaFullWidth': 'katakana-iroha',
|
||||||
|
'lowerLetter': 'lower-alpha',
|
||||||
|
'lowerRoman': 'lower-roman',
|
||||||
|
'none': 'none',
|
||||||
|
'upperLetter': 'upper-alpha',
|
||||||
|
'upperRoman': 'upper-roman',
|
||||||
|
'chineseCounting': 'cjk-ideographic',
|
||||||
|
'decimalZero': 'decimal-leading-zero',
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def alphabet(val, lower=True):
|
||||||
|
x = string.ascii_lowercase if lower else string.ascii_uppercase
|
||||||
|
return x[(abs(val - 1)) % len(x)]
|
||||||
|
|
||||||
|
|
||||||
|
alphabet_map = {
|
||||||
|
'lower-alpha':alphabet, 'upper-alpha':partial(alphabet, lower=False),
|
||||||
|
'lower-roman':lambda x:roman(x).lower(), 'upper-roman':roman,
|
||||||
|
'decimal-leading-zero': lambda x: '0%d' % x
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class Level(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace, lvl=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.restart = None
|
||||||
|
self.start = 0
|
||||||
|
self.fmt = 'decimal'
|
||||||
|
self.para_link = None
|
||||||
|
self.paragraph_style = self.character_style = None
|
||||||
|
self.is_numbered = False
|
||||||
|
self.num_template = None
|
||||||
|
self.bullet_template = None
|
||||||
|
self.pic_id = None
|
||||||
|
|
||||||
|
if lvl is not None:
|
||||||
|
self.read_from_xml(lvl)
|
||||||
|
|
||||||
|
def copy(self):
|
||||||
|
ans = Level(self.namespace)
|
||||||
|
for x in ('restart', 'pic_id', 'start', 'fmt', 'para_link', 'paragraph_style', 'character_style', 'is_numbered', 'num_template', 'bullet_template'):
|
||||||
|
setattr(ans, x, getattr(self, x))
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def format_template(self, counter, ilvl, template):
|
||||||
|
def sub(m):
|
||||||
|
x = int(m.group(1)) - 1
|
||||||
|
if x > ilvl or x not in counter:
|
||||||
|
return ''
|
||||||
|
val = counter[x] - (0 if x == ilvl else 1)
|
||||||
|
formatter = alphabet_map.get(self.fmt, lambda x: '%d' % x)
|
||||||
|
return formatter(val)
|
||||||
|
return re.sub(r'%(\d+)', sub, template).rstrip() + '\xa0'
|
||||||
|
|
||||||
|
def read_from_xml(self, lvl, override=False):
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
for lr in XPath('./w:lvlRestart[@w:val]')(lvl):
|
||||||
|
try:
|
||||||
|
self.restart = int(get(lr, 'w:val'))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
for lr in XPath('./w:start[@w:val]')(lvl):
|
||||||
|
try:
|
||||||
|
self.start = int(get(lr, 'w:val'))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
for rPr in XPath('./w:rPr')(lvl):
|
||||||
|
ps = RunStyle(self.namespace, rPr)
|
||||||
|
if self.character_style is None:
|
||||||
|
self.character_style = ps
|
||||||
|
else:
|
||||||
|
self.character_style.update(ps)
|
||||||
|
|
||||||
|
lt = None
|
||||||
|
for lr in XPath('./w:lvlText[@w:val]')(lvl):
|
||||||
|
lt = get(lr, 'w:val')
|
||||||
|
|
||||||
|
for lr in XPath('./w:numFmt[@w:val]')(lvl):
|
||||||
|
val = get(lr, 'w:val')
|
||||||
|
if val == 'bullet':
|
||||||
|
self.is_numbered = False
|
||||||
|
cs = self.character_style
|
||||||
|
if lt in {'\uf0a7', 'o'} or (
|
||||||
|
cs is not None and cs.font_family is not inherit and cs.font_family.lower() in {'wingdings', 'symbol'}):
|
||||||
|
self.fmt = {'\uf0a7':'square', 'o':'circle'}.get(lt, 'disc')
|
||||||
|
else:
|
||||||
|
self.bullet_template = lt
|
||||||
|
for lpid in XPath('./w:lvlPicBulletId[@w:val]')(lvl):
|
||||||
|
self.pic_id = get(lpid, 'w:val')
|
||||||
|
else:
|
||||||
|
self.is_numbered = True
|
||||||
|
self.fmt = STYLE_MAP.get(val, 'decimal')
|
||||||
|
if lt and re.match(r'%\d+\.$', lt) is None:
|
||||||
|
self.num_template = lt
|
||||||
|
|
||||||
|
for lr in XPath('./w:pStyle[@w:val]')(lvl):
|
||||||
|
self.para_link = get(lr, 'w:val')
|
||||||
|
|
||||||
|
for pPr in XPath('./w:pPr')(lvl):
|
||||||
|
ps = ParagraphStyle(self.namespace, pPr)
|
||||||
|
if self.paragraph_style is None:
|
||||||
|
self.paragraph_style = ps
|
||||||
|
else:
|
||||||
|
self.paragraph_style.update(ps)
|
||||||
|
|
||||||
|
def css(self, images, pic_map, rid_map):
|
||||||
|
ans = {'list-style-type': self.fmt}
|
||||||
|
if self.pic_id:
|
||||||
|
rid = pic_map.get(self.pic_id, None)
|
||||||
|
if rid:
|
||||||
|
try:
|
||||||
|
fname = images.generate_filename(rid, rid_map=rid_map, max_width=20, max_height=20)
|
||||||
|
except Exception:
|
||||||
|
fname = None
|
||||||
|
else:
|
||||||
|
ans['list-style-image'] = 'url("images/%s")' % fname
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def char_css(self):
|
||||||
|
try:
|
||||||
|
css = self.character_style.css
|
||||||
|
except AttributeError:
|
||||||
|
css = {}
|
||||||
|
css.pop('font-family', None)
|
||||||
|
return css
|
||||||
|
|
||||||
|
|
||||||
|
class NumberingDefinition(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace, parent=None, an_id=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
self.levels = {}
|
||||||
|
self.abstract_numbering_definition_id = an_id
|
||||||
|
if parent is not None:
|
||||||
|
for lvl in XPath('./w:lvl')(parent):
|
||||||
|
try:
|
||||||
|
ilvl = int(get(lvl, 'w:ilvl', 0))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
ilvl = 0
|
||||||
|
self.levels[ilvl] = Level(namespace, lvl)
|
||||||
|
|
||||||
|
def copy(self):
|
||||||
|
ans = NumberingDefinition(self.namespace, an_id=self.abstract_numbering_definition_id)
|
||||||
|
for l, lvl in iteritems(self.levels):
|
||||||
|
ans.levels[l] = lvl.copy()
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
class Numbering(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.definitions = {}
|
||||||
|
self.instances = {}
|
||||||
|
self.counters = defaultdict(Counter)
|
||||||
|
self.starts = {}
|
||||||
|
self.pic_map = {}
|
||||||
|
|
||||||
|
def __call__(self, root, styles, rid_map):
|
||||||
|
' Read all numbering style definitions '
|
||||||
|
XPath, get = self.namespace.XPath, self.namespace.get
|
||||||
|
self.rid_map = rid_map
|
||||||
|
for npb in XPath('./w:numPicBullet[@w:numPicBulletId]')(root):
|
||||||
|
npbid = get(npb, 'w:numPicBulletId')
|
||||||
|
for idata in XPath('descendant::v:imagedata[@r:id]')(npb):
|
||||||
|
rid = get(idata, 'r:id')
|
||||||
|
self.pic_map[npbid] = rid
|
||||||
|
lazy_load = {}
|
||||||
|
for an in XPath('./w:abstractNum[@w:abstractNumId]')(root):
|
||||||
|
an_id = get(an, 'w:abstractNumId')
|
||||||
|
nsl = XPath('./w:numStyleLink[@w:val]')(an)
|
||||||
|
if nsl:
|
||||||
|
lazy_load[an_id] = get(nsl[0], 'w:val')
|
||||||
|
else:
|
||||||
|
nd = NumberingDefinition(self.namespace, an, an_id=an_id)
|
||||||
|
self.definitions[an_id] = nd
|
||||||
|
|
||||||
|
def create_instance(n, definition):
|
||||||
|
nd = definition.copy()
|
||||||
|
start_overrides = {}
|
||||||
|
for lo in XPath('./w:lvlOverride')(n):
|
||||||
|
try:
|
||||||
|
ilvl = int(get(lo, 'w:ilvl'))
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
ilvl = None
|
||||||
|
for so in XPath('./w:startOverride[@w:val]')(lo):
|
||||||
|
try:
|
||||||
|
start_override = int(get(so, 'w:val'))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
start_overrides[ilvl] = start_override
|
||||||
|
for lvl in XPath('./w:lvl')(lo)[:1]:
|
||||||
|
nilvl = get(lvl, 'w:ilvl')
|
||||||
|
ilvl = nilvl if ilvl is None else ilvl
|
||||||
|
alvl = nd.levels.get(ilvl, None)
|
||||||
|
if alvl is None:
|
||||||
|
alvl = Level(self.namespace)
|
||||||
|
alvl.read_from_xml(lvl, override=True)
|
||||||
|
for ilvl, so in iteritems(start_overrides):
|
||||||
|
try:
|
||||||
|
nd.levels[ilvl].start = start_override
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
return nd
|
||||||
|
|
||||||
|
next_pass = {}
|
||||||
|
for n in XPath('./w:num[@w:numId]')(root):
|
||||||
|
an_id = None
|
||||||
|
num_id = get(n, 'w:numId')
|
||||||
|
for an in XPath('./w:abstractNumId[@w:val]')(n):
|
||||||
|
an_id = get(an, 'w:val')
|
||||||
|
d = self.definitions.get(an_id, None)
|
||||||
|
if d is None:
|
||||||
|
next_pass[num_id] = (an_id, n)
|
||||||
|
continue
|
||||||
|
self.instances[num_id] = create_instance(n, d)
|
||||||
|
|
||||||
|
numbering_links = styles.numbering_style_links
|
||||||
|
for an_id, style_link in iteritems(lazy_load):
|
||||||
|
num_id = numbering_links[style_link]
|
||||||
|
self.definitions[an_id] = self.instances[num_id].copy()
|
||||||
|
|
||||||
|
for num_id, (an_id, n) in iteritems(next_pass):
|
||||||
|
d = self.definitions.get(an_id, None)
|
||||||
|
if d is not None:
|
||||||
|
self.instances[num_id] = create_instance(n, d)
|
||||||
|
|
||||||
|
for num_id, d in iteritems(self.instances):
|
||||||
|
self.starts[num_id] = {lvl:d.levels[lvl].start for lvl in d.levels}
|
||||||
|
|
||||||
|
def get_pstyle(self, num_id, style_id):
|
||||||
|
d = self.instances.get(num_id, None)
|
||||||
|
if d is not None:
|
||||||
|
for ilvl, lvl in iteritems(d.levels):
|
||||||
|
if lvl.para_link == style_id:
|
||||||
|
return ilvl
|
||||||
|
|
||||||
|
def get_para_style(self, num_id, lvl):
|
||||||
|
d = self.instances.get(num_id, None)
|
||||||
|
if d is not None:
|
||||||
|
lvl = d.levels.get(lvl, None)
|
||||||
|
return getattr(lvl, 'paragraph_style', None)
|
||||||
|
|
||||||
|
def update_counter(self, counter, levelnum, levels):
|
||||||
|
counter[levelnum] += 1
|
||||||
|
for ilvl, lvl in iteritems(levels):
|
||||||
|
restart = lvl.restart
|
||||||
|
if (restart is None and ilvl == levelnum + 1) or restart == levelnum + 1:
|
||||||
|
counter[ilvl] = lvl.start
|
||||||
|
|
||||||
|
def apply_markup(self, items, body, styles, object_map, images):
|
||||||
|
seen_instances = set()
|
||||||
|
for p, num_id, ilvl in items:
|
||||||
|
d = self.instances.get(num_id, None)
|
||||||
|
if d is not None:
|
||||||
|
lvl = d.levels.get(ilvl, None)
|
||||||
|
if lvl is not None:
|
||||||
|
an_id = d.abstract_numbering_definition_id
|
||||||
|
counter = self.counters[an_id]
|
||||||
|
if ilvl not in counter or num_id not in seen_instances:
|
||||||
|
counter[ilvl] = self.starts[num_id][ilvl]
|
||||||
|
seen_instances.add(num_id)
|
||||||
|
p.tag = 'li'
|
||||||
|
p.set('value', '%s' % counter[ilvl])
|
||||||
|
p.set('list-lvl', unicode_type(ilvl))
|
||||||
|
p.set('list-id', num_id)
|
||||||
|
if lvl.num_template is not None:
|
||||||
|
val = lvl.format_template(counter, ilvl, lvl.num_template)
|
||||||
|
p.set('list-template', val)
|
||||||
|
elif lvl.bullet_template is not None:
|
||||||
|
val = lvl.format_template(counter, ilvl, lvl.bullet_template)
|
||||||
|
p.set('list-template', val)
|
||||||
|
self.update_counter(counter, ilvl, d.levels)
|
||||||
|
|
||||||
|
templates = {}
|
||||||
|
|
||||||
|
def commit(current_run):
|
||||||
|
if not current_run:
|
||||||
|
return
|
||||||
|
start = current_run[0]
|
||||||
|
parent = start.getparent()
|
||||||
|
idx = parent.index(start)
|
||||||
|
|
||||||
|
d = self.instances[start.get('list-id')]
|
||||||
|
ilvl = int(start.get('list-lvl'))
|
||||||
|
lvl = d.levels[ilvl]
|
||||||
|
lvlid = start.get('list-id') + start.get('list-lvl')
|
||||||
|
has_template = 'list-template' in start.attrib
|
||||||
|
wrap = (OL if lvl.is_numbered or has_template else UL)('\n\t')
|
||||||
|
if has_template:
|
||||||
|
wrap.set('lvlid', lvlid)
|
||||||
|
else:
|
||||||
|
wrap.set('class', styles.register(lvl.css(images, self.pic_map, self.rid_map), 'list'))
|
||||||
|
ccss = lvl.char_css()
|
||||||
|
if ccss:
|
||||||
|
ccss = styles.register(ccss, 'bullet')
|
||||||
|
parent.insert(idx, wrap)
|
||||||
|
last_val = None
|
||||||
|
for child in current_run:
|
||||||
|
wrap.append(child)
|
||||||
|
child.tail = '\n\t'
|
||||||
|
if has_template:
|
||||||
|
span = SPAN()
|
||||||
|
span.text = child.text
|
||||||
|
child.text = None
|
||||||
|
for gc in child:
|
||||||
|
span.append(gc)
|
||||||
|
child.append(span)
|
||||||
|
span = SPAN(child.get('list-template'))
|
||||||
|
if ccss:
|
||||||
|
span.set('class', ccss)
|
||||||
|
last = templates.get(lvlid, '')
|
||||||
|
if span.text and len(span.text) > len(last):
|
||||||
|
templates[lvlid] = span.text
|
||||||
|
child.insert(0, span)
|
||||||
|
for attr in ('list-lvl', 'list-id', 'list-template'):
|
||||||
|
child.attrib.pop(attr, None)
|
||||||
|
val = int(child.get('value'))
|
||||||
|
if last_val == val - 1 or wrap.tag == 'ul' or (last_val is None and val == 1):
|
||||||
|
child.attrib.pop('value')
|
||||||
|
last_val = val
|
||||||
|
current_run[-1].tail = '\n'
|
||||||
|
del current_run[:]
|
||||||
|
|
||||||
|
parents = set()
|
||||||
|
for child in body.iterdescendants('li'):
|
||||||
|
parents.add(child.getparent())
|
||||||
|
|
||||||
|
for parent in parents:
|
||||||
|
current_run = []
|
||||||
|
for child in parent:
|
||||||
|
if child.tag == 'li':
|
||||||
|
if current_run:
|
||||||
|
last = current_run[-1]
|
||||||
|
if (last.get('list-id') , last.get('list-lvl')) != (child.get('list-id'), child.get('list-lvl')):
|
||||||
|
commit(current_run)
|
||||||
|
current_run.append(child)
|
||||||
|
else:
|
||||||
|
commit(current_run)
|
||||||
|
commit(current_run)
|
||||||
|
|
||||||
|
# Convert the list items that use custom text for bullets into tables
|
||||||
|
# so that they display correctly
|
||||||
|
for wrap in body.xpath('//ol[@lvlid]'):
|
||||||
|
wrap.attrib.pop('lvlid')
|
||||||
|
wrap.tag = 'div'
|
||||||
|
wrap.set('style', 'display:table')
|
||||||
|
for i, li in enumerate(wrap.iterchildren('li')):
|
||||||
|
li.tag = 'div'
|
||||||
|
li.attrib.pop('value', None)
|
||||||
|
li.set('style', 'display:table-row')
|
||||||
|
obj = object_map[li]
|
||||||
|
bs = styles.para_cache[obj]
|
||||||
|
if i == 0:
|
||||||
|
wrap.set('style', 'display:table; padding-left:%s' %
|
||||||
|
bs.css.get('margin-left', '0'))
|
||||||
|
bs.css.pop('margin-left', None)
|
||||||
|
for child in li:
|
||||||
|
child.set('style', 'display:table-cell')
|
||||||
21
ebook_converter/ebooks/docx/settings.py
Normal file
21
ebook_converter/ebooks/docx/settings.py
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
|
||||||
|
class Settings(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.default_tab_stop = 720 / 20
|
||||||
|
self.namespace = namespace
|
||||||
|
|
||||||
|
def __call__(self, root):
|
||||||
|
for dts in self.namespace.XPath('//w:defaultTabStop[@w:val]')(root):
|
||||||
|
try:
|
||||||
|
self.default_tab_stop = int(self.namespace.get(dts, 'w:val')) / 20
|
||||||
|
except (ValueError, TypeError, AttributeError):
|
||||||
|
pass
|
||||||
|
|
||||||
504
ebook_converter/ebooks/docx/styles.py
Normal file
504
ebook_converter/ebooks/docx/styles.py
Normal file
@@ -0,0 +1,504 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import textwrap
|
||||||
|
from collections import OrderedDict, Counter
|
||||||
|
|
||||||
|
from calibre.ebooks.docx.block_styles import ParagraphStyle, inherit, twips
|
||||||
|
from calibre.ebooks.docx.char_styles import RunStyle
|
||||||
|
from calibre.ebooks.docx.tables import TableStyle
|
||||||
|
from polyglot.builtins import iteritems, itervalues
|
||||||
|
|
||||||
|
|
||||||
|
class PageProperties(object):
|
||||||
|
|
||||||
|
'''
|
||||||
|
Class representing page level properties (page size/margins) read from
|
||||||
|
sectPr elements.
|
||||||
|
'''
|
||||||
|
|
||||||
|
def __init__(self, namespace, elems=()):
|
||||||
|
self.width, self.height = 595.28, 841.89 # pts, A4
|
||||||
|
self.margin_left = self.margin_right = 72 # pts
|
||||||
|
|
||||||
|
def setval(attr, val):
|
||||||
|
val = twips(val)
|
||||||
|
if val is not None:
|
||||||
|
setattr(self, attr, val)
|
||||||
|
|
||||||
|
for sectPr in elems:
|
||||||
|
for pgSz in namespace.XPath('./w:pgSz')(sectPr):
|
||||||
|
w, h = namespace.get(pgSz, 'w:w'), namespace.get(pgSz, 'w:h')
|
||||||
|
setval('width', w), setval('height', h)
|
||||||
|
for pgMar in namespace.XPath('./w:pgMar')(sectPr):
|
||||||
|
l, r = namespace.get(pgMar, 'w:left'), namespace.get(pgMar, 'w:right')
|
||||||
|
setval('margin_left', l), setval('margin_right', r)
|
||||||
|
|
||||||
|
|
||||||
|
class Style(object):
|
||||||
|
'''
|
||||||
|
Class representing a <w:style> element. Can contain block, character, etc. styles.
|
||||||
|
'''
|
||||||
|
|
||||||
|
def __init__(self, namespace, elem):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.name_path = namespace.XPath('./w:name[@w:val]')
|
||||||
|
self.based_on_path = namespace.XPath('./w:basedOn[@w:val]')
|
||||||
|
self.resolved = False
|
||||||
|
self.style_id = namespace.get(elem, 'w:styleId')
|
||||||
|
self.style_type = namespace.get(elem, 'w:type')
|
||||||
|
names = self.name_path(elem)
|
||||||
|
self.name = namespace.get(names[-1], 'w:val') if names else None
|
||||||
|
based_on = self.based_on_path(elem)
|
||||||
|
self.based_on = namespace.get(based_on[0], 'w:val') if based_on else None
|
||||||
|
if self.style_type == 'numbering':
|
||||||
|
self.based_on = None
|
||||||
|
self.is_default = namespace.get(elem, 'w:default') in {'1', 'on', 'true'}
|
||||||
|
|
||||||
|
self.paragraph_style = self.character_style = self.table_style = None
|
||||||
|
|
||||||
|
if self.style_type in {'paragraph', 'character', 'table'}:
|
||||||
|
if self.style_type == 'table':
|
||||||
|
for tblPr in namespace.XPath('./w:tblPr')(elem):
|
||||||
|
ts = TableStyle(namespace, tblPr)
|
||||||
|
if self.table_style is None:
|
||||||
|
self.table_style = ts
|
||||||
|
else:
|
||||||
|
self.table_style.update(ts)
|
||||||
|
if self.style_type in {'paragraph', 'table'}:
|
||||||
|
for pPr in namespace.XPath('./w:pPr')(elem):
|
||||||
|
ps = ParagraphStyle(namespace, pPr)
|
||||||
|
if self.paragraph_style is None:
|
||||||
|
self.paragraph_style = ps
|
||||||
|
else:
|
||||||
|
self.paragraph_style.update(ps)
|
||||||
|
|
||||||
|
for rPr in namespace.XPath('./w:rPr')(elem):
|
||||||
|
rs = RunStyle(namespace, rPr)
|
||||||
|
if self.character_style is None:
|
||||||
|
self.character_style = rs
|
||||||
|
else:
|
||||||
|
self.character_style.update(rs)
|
||||||
|
|
||||||
|
if self.style_type in {'numbering', 'paragraph'}:
|
||||||
|
self.numbering_style_link = None
|
||||||
|
for x in namespace.XPath('./w:pPr/w:numPr/w:numId[@w:val]')(elem):
|
||||||
|
self.numbering_style_link = namespace.get(x, 'w:val')
|
||||||
|
|
||||||
|
def resolve_based_on(self, parent):
|
||||||
|
if parent.table_style is not None:
|
||||||
|
if self.table_style is None:
|
||||||
|
self.table_style = TableStyle(self.namespace)
|
||||||
|
self.table_style.resolve_based_on(parent.table_style)
|
||||||
|
if parent.paragraph_style is not None:
|
||||||
|
if self.paragraph_style is None:
|
||||||
|
self.paragraph_style = ParagraphStyle(self.namespace)
|
||||||
|
self.paragraph_style.resolve_based_on(parent.paragraph_style)
|
||||||
|
if parent.character_style is not None:
|
||||||
|
if self.character_style is None:
|
||||||
|
self.character_style = RunStyle(self.namespace)
|
||||||
|
self.character_style.resolve_based_on(parent.character_style)
|
||||||
|
|
||||||
|
|
||||||
|
class Styles(object):
|
||||||
|
|
||||||
|
'''
|
||||||
|
Collection of all styles defined in the document. Used to get the final styles applicable to elements in the document markup.
|
||||||
|
'''
|
||||||
|
|
||||||
|
def __init__(self, namespace, tables):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.id_map = OrderedDict()
|
||||||
|
self.para_cache = {}
|
||||||
|
self.para_char_cache = {}
|
||||||
|
self.run_cache = {}
|
||||||
|
self.classes = {}
|
||||||
|
self.counter = Counter()
|
||||||
|
self.default_styles = {}
|
||||||
|
self.tables = tables
|
||||||
|
self.numbering_style_links = {}
|
||||||
|
self.default_paragraph_style = self.default_character_style = None
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
for s in itervalues(self.id_map):
|
||||||
|
yield s
|
||||||
|
|
||||||
|
def __getitem__(self, key):
|
||||||
|
return self.id_map[key]
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return len(self.id_map)
|
||||||
|
|
||||||
|
def get(self, key, default=None):
|
||||||
|
return self.id_map.get(key, default)
|
||||||
|
|
||||||
|
def __call__(self, root, fonts, theme):
|
||||||
|
self.fonts, self.theme = fonts, theme
|
||||||
|
self.default_paragraph_style = self.default_character_style = None
|
||||||
|
if root is not None:
|
||||||
|
for s in self.namespace.XPath('//w:style')(root):
|
||||||
|
s = Style(self.namespace, s)
|
||||||
|
if s.style_id:
|
||||||
|
self.id_map[s.style_id] = s
|
||||||
|
if s.is_default:
|
||||||
|
self.default_styles[s.style_type] = s
|
||||||
|
if getattr(s, 'numbering_style_link', None) is not None:
|
||||||
|
self.numbering_style_links[s.style_id] = s.numbering_style_link
|
||||||
|
|
||||||
|
for dd in self.namespace.XPath('./w:docDefaults')(root):
|
||||||
|
for pd in self.namespace.XPath('./w:pPrDefault')(dd):
|
||||||
|
for pPr in self.namespace.XPath('./w:pPr')(pd):
|
||||||
|
ps = ParagraphStyle(self.namespace, pPr)
|
||||||
|
if self.default_paragraph_style is None:
|
||||||
|
self.default_paragraph_style = ps
|
||||||
|
else:
|
||||||
|
self.default_paragraph_style.update(ps)
|
||||||
|
for pd in self.namespace.XPath('./w:rPrDefault')(dd):
|
||||||
|
for pPr in self.namespace.XPath('./w:rPr')(pd):
|
||||||
|
ps = RunStyle(self.namespace, pPr)
|
||||||
|
if self.default_character_style is None:
|
||||||
|
self.default_character_style = ps
|
||||||
|
else:
|
||||||
|
self.default_character_style.update(ps)
|
||||||
|
|
||||||
|
def resolve(s, p):
|
||||||
|
if p is not None:
|
||||||
|
if not p.resolved:
|
||||||
|
resolve(p, self.get(p.based_on))
|
||||||
|
s.resolve_based_on(p)
|
||||||
|
s.resolved = True
|
||||||
|
|
||||||
|
for s in self:
|
||||||
|
if not s.resolved:
|
||||||
|
resolve(s, self.get(s.based_on))
|
||||||
|
|
||||||
|
def para_val(self, parent_styles, direct_formatting, attr):
|
||||||
|
val = getattr(direct_formatting, attr)
|
||||||
|
if val is inherit:
|
||||||
|
for ps in reversed(parent_styles):
|
||||||
|
pval = getattr(ps, attr)
|
||||||
|
if pval is not inherit:
|
||||||
|
val = pval
|
||||||
|
break
|
||||||
|
return val
|
||||||
|
|
||||||
|
def run_val(self, parent_styles, direct_formatting, attr):
|
||||||
|
val = getattr(direct_formatting, attr)
|
||||||
|
if val is not inherit:
|
||||||
|
return val
|
||||||
|
if attr in direct_formatting.toggle_properties:
|
||||||
|
# The spec (section 17.7.3) does not make sense, so we follow the behavior
|
||||||
|
# of Word, which seems to only consider the document default if the
|
||||||
|
# property has not been defined in any styles.
|
||||||
|
vals = [int(getattr(rs, attr)) for rs in parent_styles if rs is not self.default_character_style and getattr(rs, attr) is not inherit]
|
||||||
|
if vals:
|
||||||
|
return sum(vals) % 2 == 1
|
||||||
|
if self.default_character_style is not None:
|
||||||
|
return getattr(self.default_character_style, attr) is True
|
||||||
|
return False
|
||||||
|
for rs in reversed(parent_styles):
|
||||||
|
rval = getattr(rs, attr)
|
||||||
|
if rval is not inherit:
|
||||||
|
return rval
|
||||||
|
return val
|
||||||
|
|
||||||
|
def resolve_paragraph(self, p):
|
||||||
|
ans = self.para_cache.get(p, None)
|
||||||
|
if ans is None:
|
||||||
|
linked_style = None
|
||||||
|
ans = self.para_cache[p] = ParagraphStyle(self.namespace)
|
||||||
|
ans.style_name = None
|
||||||
|
direct_formatting = None
|
||||||
|
is_section_break = False
|
||||||
|
for pPr in self.namespace.XPath('./w:pPr')(p):
|
||||||
|
ps = ParagraphStyle(self.namespace, pPr)
|
||||||
|
if direct_formatting is None:
|
||||||
|
direct_formatting = ps
|
||||||
|
else:
|
||||||
|
direct_formatting.update(ps)
|
||||||
|
if self.namespace.XPath('./w:sectPr')(pPr):
|
||||||
|
is_section_break = True
|
||||||
|
|
||||||
|
if direct_formatting is None:
|
||||||
|
direct_formatting = ParagraphStyle(self.namespace)
|
||||||
|
parent_styles = []
|
||||||
|
if self.default_paragraph_style is not None:
|
||||||
|
parent_styles.append(self.default_paragraph_style)
|
||||||
|
ts = self.tables.para_style(p)
|
||||||
|
if ts is not None:
|
||||||
|
parent_styles.append(ts)
|
||||||
|
|
||||||
|
default_para = self.default_styles.get('paragraph', None)
|
||||||
|
if direct_formatting.linked_style is not None:
|
||||||
|
ls = linked_style = self.get(direct_formatting.linked_style)
|
||||||
|
if ls is not None:
|
||||||
|
ans.style_name = ls.name
|
||||||
|
ps = ls.paragraph_style
|
||||||
|
if ps is not None:
|
||||||
|
parent_styles.append(ps)
|
||||||
|
if ls.character_style is not None:
|
||||||
|
self.para_char_cache[p] = ls.character_style
|
||||||
|
elif default_para is not None:
|
||||||
|
if default_para.paragraph_style is not None:
|
||||||
|
parent_styles.append(default_para.paragraph_style)
|
||||||
|
if default_para.character_style is not None:
|
||||||
|
self.para_char_cache[p] = default_para.character_style
|
||||||
|
|
||||||
|
def has_numbering(block_style):
|
||||||
|
num_id, lvl = getattr(block_style, 'numbering_id', inherit), getattr(block_style, 'numbering_level', inherit)
|
||||||
|
return num_id is not None and num_id is not inherit and lvl is not None and lvl is not inherit
|
||||||
|
|
||||||
|
is_numbering = has_numbering(direct_formatting)
|
||||||
|
is_section_break = is_section_break and not self.namespace.XPath('./w:r')(p)
|
||||||
|
|
||||||
|
if is_numbering and not is_section_break:
|
||||||
|
num_id, lvl = direct_formatting.numbering_id, direct_formatting.numbering_level
|
||||||
|
p.set('calibre_num_id', '%s:%s' % (lvl, num_id))
|
||||||
|
ps = self.numbering.get_para_style(num_id, lvl)
|
||||||
|
if ps is not None:
|
||||||
|
parent_styles.append(ps)
|
||||||
|
if (
|
||||||
|
not is_numbering and not is_section_break and linked_style is not None and has_numbering(linked_style.paragraph_style)
|
||||||
|
):
|
||||||
|
num_id, lvl = linked_style.paragraph_style.numbering_id, linked_style.paragraph_style.numbering_level
|
||||||
|
p.set('calibre_num_id', '%s:%s' % (lvl, num_id))
|
||||||
|
is_numbering = True
|
||||||
|
ps = self.numbering.get_para_style(num_id, lvl)
|
||||||
|
if ps is not None:
|
||||||
|
parent_styles.append(ps)
|
||||||
|
|
||||||
|
for attr in ans.all_properties:
|
||||||
|
if not (is_numbering and attr == 'text_indent'): # skip text-indent for lists
|
||||||
|
setattr(ans, attr, self.para_val(parent_styles, direct_formatting, attr))
|
||||||
|
ans.linked_style = direct_formatting.linked_style
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def resolve_run(self, r):
|
||||||
|
ans = self.run_cache.get(r, None)
|
||||||
|
if ans is None:
|
||||||
|
p = self.namespace.XPath('ancestor::w:p[1]')(r)
|
||||||
|
p = p[0] if p else None
|
||||||
|
ans = self.run_cache[r] = RunStyle(self.namespace)
|
||||||
|
direct_formatting = None
|
||||||
|
for rPr in self.namespace.XPath('./w:rPr')(r):
|
||||||
|
rs = RunStyle(self.namespace, rPr)
|
||||||
|
if direct_formatting is None:
|
||||||
|
direct_formatting = rs
|
||||||
|
else:
|
||||||
|
direct_formatting.update(rs)
|
||||||
|
|
||||||
|
if direct_formatting is None:
|
||||||
|
direct_formatting = RunStyle(self.namespace)
|
||||||
|
|
||||||
|
parent_styles = []
|
||||||
|
default_char = self.default_styles.get('character', None)
|
||||||
|
if self.default_character_style is not None:
|
||||||
|
parent_styles.append(self.default_character_style)
|
||||||
|
pstyle = self.para_char_cache.get(p, None)
|
||||||
|
if pstyle is not None:
|
||||||
|
parent_styles.append(pstyle)
|
||||||
|
# As best as I can understand the spec, table overrides should be
|
||||||
|
# applied before paragraph overrides, but word does it
|
||||||
|
# this way, see the December 2007 table header in the demo
|
||||||
|
# document.
|
||||||
|
ts = self.tables.run_style(p)
|
||||||
|
if ts is not None:
|
||||||
|
parent_styles.append(ts)
|
||||||
|
if direct_formatting.linked_style is not None:
|
||||||
|
ls = getattr(self.get(direct_formatting.linked_style), 'character_style', None)
|
||||||
|
if ls is not None:
|
||||||
|
parent_styles.append(ls)
|
||||||
|
elif default_char is not None and default_char.character_style is not None:
|
||||||
|
parent_styles.append(default_char.character_style)
|
||||||
|
|
||||||
|
for attr in ans.all_properties:
|
||||||
|
setattr(ans, attr, self.run_val(parent_styles, direct_formatting, attr))
|
||||||
|
|
||||||
|
if ans.font_family is not inherit:
|
||||||
|
ff = self.theme.resolve_font_family(ans.font_family)
|
||||||
|
ans.font_family = self.fonts.family_for(ff, ans.b, ans.i)
|
||||||
|
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def resolve(self, obj):
|
||||||
|
if obj.tag.endswith('}p'):
|
||||||
|
return self.resolve_paragraph(obj)
|
||||||
|
if obj.tag.endswith('}r'):
|
||||||
|
return self.resolve_run(obj)
|
||||||
|
|
||||||
|
def cascade(self, layers):
|
||||||
|
self.body_font_family = 'serif'
|
||||||
|
self.body_font_size = '10pt'
|
||||||
|
self.body_color = 'black'
|
||||||
|
|
||||||
|
def promote_property(char_styles, block_style, prop):
|
||||||
|
vals = {getattr(s, prop) for s in char_styles}
|
||||||
|
if len(vals) == 1:
|
||||||
|
# All the character styles have the same value
|
||||||
|
for s in char_styles:
|
||||||
|
setattr(s, prop, inherit)
|
||||||
|
setattr(block_style, prop, next(iter(vals)))
|
||||||
|
|
||||||
|
for p, runs in iteritems(layers):
|
||||||
|
has_links = '1' in {r.get('is-link', None) for r in runs}
|
||||||
|
char_styles = [self.resolve_run(r) for r in runs]
|
||||||
|
block_style = self.resolve_paragraph(p)
|
||||||
|
for prop in ('font_family', 'font_size', 'cs_font_family', 'cs_font_size', 'color'):
|
||||||
|
if has_links and prop == 'color':
|
||||||
|
# We cannot promote color as browser rendering engines will
|
||||||
|
# override the link color setting it to blue, unless the
|
||||||
|
# color is specified on the link element itself
|
||||||
|
continue
|
||||||
|
promote_property(char_styles, block_style, prop)
|
||||||
|
for s in char_styles:
|
||||||
|
if s.text_decoration == 'none':
|
||||||
|
# The default text decoration is 'none'
|
||||||
|
s.text_decoration = inherit
|
||||||
|
|
||||||
|
def promote_most_common(block_styles, prop, default):
|
||||||
|
c = Counter()
|
||||||
|
for s in block_styles:
|
||||||
|
val = getattr(s, prop)
|
||||||
|
if val is not inherit:
|
||||||
|
c[val] += 1
|
||||||
|
val = None
|
||||||
|
if c:
|
||||||
|
val = c.most_common(1)[0][0]
|
||||||
|
for s in block_styles:
|
||||||
|
oval = getattr(s, prop)
|
||||||
|
if oval is inherit:
|
||||||
|
if default != val:
|
||||||
|
setattr(s, prop, default)
|
||||||
|
elif oval == val:
|
||||||
|
setattr(s, prop, inherit)
|
||||||
|
return val
|
||||||
|
|
||||||
|
block_styles = tuple(self.resolve_paragraph(p) for p in layers)
|
||||||
|
|
||||||
|
ff = promote_most_common(block_styles, 'font_family', self.body_font_family)
|
||||||
|
if ff is not None:
|
||||||
|
self.body_font_family = ff
|
||||||
|
|
||||||
|
fs = promote_most_common(block_styles, 'font_size', int(self.body_font_size[:2]))
|
||||||
|
if fs is not None:
|
||||||
|
self.body_font_size = '%.3gpt' % fs
|
||||||
|
|
||||||
|
color = promote_most_common(block_styles, 'color', self.body_color)
|
||||||
|
if color is not None:
|
||||||
|
self.body_color = color
|
||||||
|
|
||||||
|
def resolve_numbering(self, numbering):
|
||||||
|
# When a numPr element appears inside a paragraph style, the lvl info
|
||||||
|
# must be discarded and pStyle used instead.
|
||||||
|
self.numbering = numbering
|
||||||
|
for style in self:
|
||||||
|
ps = style.paragraph_style
|
||||||
|
if ps is not None and ps.numbering_id is not inherit:
|
||||||
|
lvl = numbering.get_pstyle(ps.numbering_id, style.style_id)
|
||||||
|
if lvl is None:
|
||||||
|
ps.numbering_id = ps.numbering_level = inherit
|
||||||
|
else:
|
||||||
|
ps.numbering_level = lvl
|
||||||
|
|
||||||
|
def apply_contextual_spacing(self, paras):
|
||||||
|
last_para = None
|
||||||
|
for p in paras:
|
||||||
|
if last_para is not None:
|
||||||
|
ls = self.resolve_paragraph(last_para)
|
||||||
|
ps = self.resolve_paragraph(p)
|
||||||
|
if ls.linked_style is not None and ls.linked_style == ps.linked_style:
|
||||||
|
if ls.contextualSpacing is True:
|
||||||
|
ls.margin_bottom = 0
|
||||||
|
if ps.contextualSpacing is True:
|
||||||
|
ps.margin_top = 0
|
||||||
|
last_para = p
|
||||||
|
|
||||||
|
def apply_section_page_breaks(self, paras):
|
||||||
|
for p in paras:
|
||||||
|
ps = self.resolve_paragraph(p)
|
||||||
|
ps.pageBreakBefore = True
|
||||||
|
|
||||||
|
def register(self, css, prefix):
|
||||||
|
h = hash(frozenset(iteritems(css)))
|
||||||
|
ans, _ = self.classes.get(h, (None, None))
|
||||||
|
if ans is None:
|
||||||
|
self.counter[prefix] += 1
|
||||||
|
ans = '%s_%d' % (prefix, self.counter[prefix])
|
||||||
|
self.classes[h] = (ans, css)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def generate_classes(self):
|
||||||
|
for bs in itervalues(self.para_cache):
|
||||||
|
css = bs.css
|
||||||
|
if css:
|
||||||
|
self.register(css, 'block')
|
||||||
|
for bs in itervalues(self.run_cache):
|
||||||
|
css = bs.css
|
||||||
|
if css:
|
||||||
|
self.register(css, 'text')
|
||||||
|
|
||||||
|
def class_name(self, css):
|
||||||
|
h = hash(frozenset(iteritems(css)))
|
||||||
|
return self.classes.get(h, (None, None))[0]
|
||||||
|
|
||||||
|
def generate_css(self, dest_dir, docx, notes_nopb, nosupsub):
|
||||||
|
ef = self.fonts.embed_fonts(dest_dir, docx)
|
||||||
|
|
||||||
|
s = '''\
|
||||||
|
body { font-family: %s; font-size: %s; color: %s }
|
||||||
|
|
||||||
|
/* In word all paragraphs have zero margins unless explicitly specified in a style */
|
||||||
|
p, h1, h2, h3, h4, h5, h6, div { margin: 0; padding: 0 }
|
||||||
|
/* In word headings only have bold font if explicitly specified,
|
||||||
|
similarly the font size is the body font size, unless explicitly set. */
|
||||||
|
h1, h2, h3, h4, h5, h6 { font-weight: normal; font-size: 1rem }
|
||||||
|
/* Setting padding-left to zero breaks rendering of lists, so we only set the other values to zero and leave padding-left for the user-agent */
|
||||||
|
ul, ol { margin: 0; padding-top: 0; padding-bottom: 0; padding-right: 0 }
|
||||||
|
|
||||||
|
/* The word hyperlink styling will set text-decoration to underline if needed */
|
||||||
|
a { text-decoration: none }
|
||||||
|
|
||||||
|
sup.noteref a { text-decoration: none }
|
||||||
|
|
||||||
|
h1.notes-header { page-break-before: always }
|
||||||
|
|
||||||
|
dl.footnote dt { font-size: large }
|
||||||
|
|
||||||
|
dl.footnote dt a { text-decoration: none }
|
||||||
|
|
||||||
|
'''
|
||||||
|
|
||||||
|
if not notes_nopb:
|
||||||
|
s += '''\
|
||||||
|
dl.footnote { page-break-after: always }
|
||||||
|
dl.footnote:last-of-type { page-break-after: avoid }
|
||||||
|
'''
|
||||||
|
|
||||||
|
s = s + '''\
|
||||||
|
span.tab { white-space: pre }
|
||||||
|
|
||||||
|
p.index-entry { text-indent: 0pt; }
|
||||||
|
p.index-entry a:visited { color: blue }
|
||||||
|
p.index-entry a:hover { color: red }
|
||||||
|
'''
|
||||||
|
|
||||||
|
if nosupsub:
|
||||||
|
s = s + '''\
|
||||||
|
sup { vertical-align: top }
|
||||||
|
sub { vertical-align: bottom }
|
||||||
|
'''
|
||||||
|
|
||||||
|
prefix = textwrap.dedent(s) % (self.body_font_family, self.body_font_size, self.body_color)
|
||||||
|
if ef:
|
||||||
|
prefix = ef + '\n' + prefix
|
||||||
|
|
||||||
|
ans = []
|
||||||
|
for (cls, css) in sorted(itervalues(self.classes), key=lambda x:x[0]):
|
||||||
|
b = ('\t%s: %s;' % (k, v) for k, v in iteritems(css))
|
||||||
|
b = '\n'.join(b)
|
||||||
|
ans.append('.%s {\n%s\n}\n' % (cls, b.rstrip(';')))
|
||||||
|
return prefix + '\n' + '\n'.join(ans)
|
||||||
700
ebook_converter/ebooks/docx/tables.py
Normal file
700
ebook_converter/ebooks/docx/tables.py
Normal file
@@ -0,0 +1,700 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from lxml.html.builder import TABLE, TR, TD
|
||||||
|
|
||||||
|
from calibre.ebooks.docx.block_styles import inherit, read_shd as rs, read_border, binary_property, border_props, ParagraphStyle, border_to_css
|
||||||
|
from calibre.ebooks.docx.char_styles import RunStyle
|
||||||
|
from polyglot.builtins import filter, iteritems, itervalues, range, unicode_type
|
||||||
|
|
||||||
|
# Read from XML {{{
|
||||||
|
read_shd = rs
|
||||||
|
edges = ('left', 'top', 'right', 'bottom')
|
||||||
|
|
||||||
|
|
||||||
|
def _read_width(elem, get):
|
||||||
|
ans = inherit
|
||||||
|
try:
|
||||||
|
w = int(get(elem, 'w:w'))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
w = 0
|
||||||
|
typ = get(elem, 'w:type', 'auto')
|
||||||
|
if typ == 'nil':
|
||||||
|
ans = '0'
|
||||||
|
elif typ == 'auto':
|
||||||
|
ans = 'auto'
|
||||||
|
elif typ == 'dxa':
|
||||||
|
ans = '%.3gpt' % (w/20)
|
||||||
|
elif typ == 'pct':
|
||||||
|
ans = '%.3g%%' % (w/50)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
def read_width(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for tblW in XPath('./w:tblW')(parent):
|
||||||
|
ans = _read_width(tblW, get)
|
||||||
|
setattr(dest, 'width', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_cell_width(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for tblW in XPath('./w:tcW')(parent):
|
||||||
|
ans = _read_width(tblW, get)
|
||||||
|
setattr(dest, 'width', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_padding(parent, dest, XPath, get):
|
||||||
|
name = 'tblCellMar' if parent.tag.endswith('}tblPr') else 'tcMar'
|
||||||
|
ans = {x:inherit for x in edges}
|
||||||
|
for mar in XPath('./w:%s' % name)(parent):
|
||||||
|
for x in edges:
|
||||||
|
for edge in XPath('./w:%s' % x)(mar):
|
||||||
|
ans[x] = _read_width(edge, get)
|
||||||
|
for x in edges:
|
||||||
|
setattr(dest, 'cell_padding_%s' % x, ans[x])
|
||||||
|
|
||||||
|
|
||||||
|
def read_justification(parent, dest, XPath, get):
|
||||||
|
left = right = inherit
|
||||||
|
for jc in XPath('./w:jc[@w:val]')(parent):
|
||||||
|
val = get(jc, 'w:val')
|
||||||
|
if not val:
|
||||||
|
continue
|
||||||
|
if val == 'left':
|
||||||
|
right = 'auto'
|
||||||
|
elif val == 'right':
|
||||||
|
left = 'auto'
|
||||||
|
elif val == 'center':
|
||||||
|
left = right = 'auto'
|
||||||
|
setattr(dest, 'margin_left', left)
|
||||||
|
setattr(dest, 'margin_right', right)
|
||||||
|
|
||||||
|
|
||||||
|
def read_spacing(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for cs in XPath('./w:tblCellSpacing')(parent):
|
||||||
|
ans = _read_width(cs, get)
|
||||||
|
setattr(dest, 'spacing', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_float(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for x in XPath('./w:tblpPr')(parent):
|
||||||
|
ans = {k.rpartition('}')[-1]: v for k, v in iteritems(x.attrib)}
|
||||||
|
setattr(dest, 'float', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_indent(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for cs in XPath('./w:tblInd')(parent):
|
||||||
|
ans = _read_width(cs, get)
|
||||||
|
setattr(dest, 'indent', ans)
|
||||||
|
|
||||||
|
|
||||||
|
border_edges = ('left', 'top', 'right', 'bottom', 'insideH', 'insideV')
|
||||||
|
|
||||||
|
|
||||||
|
def read_borders(parent, dest, XPath, get):
|
||||||
|
name = 'tblBorders' if parent.tag.endswith('}tblPr') else 'tcBorders'
|
||||||
|
read_border(parent, dest, XPath, get, border_edges, name)
|
||||||
|
|
||||||
|
|
||||||
|
def read_height(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for rh in XPath('./w:trHeight')(parent):
|
||||||
|
rule = get(rh, 'w:hRule', 'auto')
|
||||||
|
if rule in {'auto', 'atLeast', 'exact'}:
|
||||||
|
val = get(rh, 'w:val')
|
||||||
|
ans = (rule, val)
|
||||||
|
setattr(dest, 'height', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_vertical_align(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for va in XPath('./w:vAlign')(parent):
|
||||||
|
val = get(va, 'w:val')
|
||||||
|
ans = {'center': 'middle', 'top': 'top', 'bottom': 'bottom'}.get(val, 'middle')
|
||||||
|
setattr(dest, 'vertical_align', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_col_span(parent, dest, XPath, get):
|
||||||
|
ans = inherit
|
||||||
|
for gs in XPath('./w:gridSpan')(parent):
|
||||||
|
try:
|
||||||
|
ans = int(get(gs, 'w:val'))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
setattr(dest, 'col_span', ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_merge(parent, dest, XPath, get):
|
||||||
|
for x in ('hMerge', 'vMerge'):
|
||||||
|
ans = inherit
|
||||||
|
for m in XPath('./w:%s' % x)(parent):
|
||||||
|
ans = get(m, 'w:val', 'continue')
|
||||||
|
setattr(dest, x, ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_band_size(parent, dest, XPath, get):
|
||||||
|
for x in ('Col', 'Row'):
|
||||||
|
ans = 1
|
||||||
|
for y in XPath('./w:tblStyle%sBandSize' % x)(parent):
|
||||||
|
try:
|
||||||
|
ans = int(get(y, 'w:val'))
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
continue
|
||||||
|
setattr(dest, '%s_band_size' % x.lower(), ans)
|
||||||
|
|
||||||
|
|
||||||
|
def read_look(parent, dest, XPath, get):
|
||||||
|
ans = 0
|
||||||
|
for x in XPath('./w:tblLook')(parent):
|
||||||
|
try:
|
||||||
|
ans = int(get(x, 'w:val'), 16)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
continue
|
||||||
|
setattr(dest, 'look', ans)
|
||||||
|
|
||||||
|
# }}}
|
||||||
|
|
||||||
|
|
||||||
|
def clone(style):
|
||||||
|
if style is None:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
ans = type(style)(style.namespace)
|
||||||
|
except TypeError:
|
||||||
|
return None
|
||||||
|
ans.update(style)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
|
||||||
|
class Style(object):
|
||||||
|
|
||||||
|
is_bidi = False
|
||||||
|
|
||||||
|
def update(self, other):
|
||||||
|
for prop in self.all_properties:
|
||||||
|
nval = getattr(other, prop)
|
||||||
|
if nval is not inherit:
|
||||||
|
setattr(self, prop, nval)
|
||||||
|
|
||||||
|
def apply_bidi(self):
|
||||||
|
self.is_bidi = True
|
||||||
|
|
||||||
|
def convert_spacing(self):
|
||||||
|
ans = {}
|
||||||
|
if self.spacing is not inherit:
|
||||||
|
if self.spacing in {'auto', '0'}:
|
||||||
|
ans['border-collapse'] = 'collapse'
|
||||||
|
else:
|
||||||
|
ans['border-collapse'] = 'separate'
|
||||||
|
ans['border-spacing'] = self.spacing
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def convert_border(self):
|
||||||
|
c = {}
|
||||||
|
for x in edges:
|
||||||
|
border_to_css(x, self, c)
|
||||||
|
val = getattr(self, 'padding_%s' % x)
|
||||||
|
if val is not inherit:
|
||||||
|
c['padding-%s' % x] = '%.3gpt' % val
|
||||||
|
if self.is_bidi:
|
||||||
|
for a in ('padding-%s', 'border-%s-style', 'border-%s-color', 'border-%s-width'):
|
||||||
|
l, r = c.get(a % 'left'), c.get(a % 'right')
|
||||||
|
if l is not None:
|
||||||
|
c[a % 'right'] = l
|
||||||
|
if r is not None:
|
||||||
|
c[a % 'left'] = r
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
class RowStyle(Style):
|
||||||
|
|
||||||
|
all_properties = ('height', 'cantSplit', 'hidden', 'spacing',)
|
||||||
|
|
||||||
|
def __init__(self, namespace, trPr=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
if trPr is None:
|
||||||
|
for p in self.all_properties:
|
||||||
|
setattr(self, p, inherit)
|
||||||
|
else:
|
||||||
|
for p in ('hidden', 'cantSplit'):
|
||||||
|
setattr(self, p, binary_property(trPr, p, namespace.XPath, namespace.get))
|
||||||
|
for p in ('spacing', 'height'):
|
||||||
|
f = globals()['read_%s' % p]
|
||||||
|
f(trPr, self, namespace.XPath, namespace.get)
|
||||||
|
self._css = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def css(self):
|
||||||
|
if self._css is None:
|
||||||
|
c = self._css = {}
|
||||||
|
if self.hidden is True:
|
||||||
|
c['display'] = 'none'
|
||||||
|
if self.cantSplit is True:
|
||||||
|
c['page-break-inside'] = 'avoid'
|
||||||
|
if self.height is not inherit:
|
||||||
|
rule, val = self.height
|
||||||
|
if rule != 'auto':
|
||||||
|
try:
|
||||||
|
c['min-height' if rule == 'atLeast' else 'height'] = '%.3gpt' % (int(val)/20)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
c.update(self.convert_spacing())
|
||||||
|
return self._css
|
||||||
|
|
||||||
|
|
||||||
|
class CellStyle(Style):
|
||||||
|
|
||||||
|
all_properties = ('background_color', 'cell_padding_left', 'cell_padding_right', 'cell_padding_top',
|
||||||
|
'cell_padding_bottom', 'width', 'vertical_align', 'col_span', 'vMerge', 'hMerge', 'row_span',
|
||||||
|
) + tuple(k % edge for edge in border_edges for k in border_props)
|
||||||
|
|
||||||
|
def __init__(self, namespace, tcPr=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
if tcPr is None:
|
||||||
|
for p in self.all_properties:
|
||||||
|
setattr(self, p, inherit)
|
||||||
|
else:
|
||||||
|
for x in ('borders', 'shd', 'padding', 'cell_width', 'vertical_align', 'col_span', 'merge'):
|
||||||
|
f = globals()['read_%s' % x]
|
||||||
|
f(tcPr, self, namespace.XPath, namespace.get)
|
||||||
|
self.row_span = inherit
|
||||||
|
self._css = None
|
||||||
|
|
||||||
|
@property
|
||||||
|
def css(self):
|
||||||
|
if self._css is None:
|
||||||
|
self._css = c = {}
|
||||||
|
if self.background_color is not inherit:
|
||||||
|
c['background-color'] = self.background_color
|
||||||
|
if self.width not in (inherit, 'auto'):
|
||||||
|
c['width'] = self.width
|
||||||
|
c['vertical-align'] = 'top' if self.vertical_align is inherit else self.vertical_align
|
||||||
|
for x in edges:
|
||||||
|
val = getattr(self, 'cell_padding_%s' % x)
|
||||||
|
if val not in (inherit, 'auto'):
|
||||||
|
c['padding-%s' % x] = val
|
||||||
|
elif val is inherit and x in {'left', 'right'}:
|
||||||
|
c['padding-%s' % x] = '%.3gpt' % (115/20)
|
||||||
|
# In Word, tables are apparently rendered with some default top and
|
||||||
|
# bottom padding irrespective of the cellMargin values. Simulate
|
||||||
|
# that here.
|
||||||
|
for x in ('top', 'bottom'):
|
||||||
|
if c.get('padding-%s' % x, '0pt') == '0pt':
|
||||||
|
c['padding-%s' % x] = '0.5ex'
|
||||||
|
c.update(self.convert_border())
|
||||||
|
|
||||||
|
return self._css
|
||||||
|
|
||||||
|
|
||||||
|
class TableStyle(Style):
|
||||||
|
|
||||||
|
all_properties = (
|
||||||
|
'width', 'float', 'cell_padding_left', 'cell_padding_right', 'cell_padding_top',
|
||||||
|
'cell_padding_bottom', 'margin_left', 'margin_right', 'background_color',
|
||||||
|
'spacing', 'indent', 'overrides', 'col_band_size', 'row_band_size', 'look', 'bidi',
|
||||||
|
) + tuple(k % edge for edge in border_edges for k in border_props)
|
||||||
|
|
||||||
|
def __init__(self, namespace, tblPr=None):
|
||||||
|
self.namespace = namespace
|
||||||
|
if tblPr is None:
|
||||||
|
for p in self.all_properties:
|
||||||
|
setattr(self, p, inherit)
|
||||||
|
else:
|
||||||
|
self.overrides = inherit
|
||||||
|
self.bidi = binary_property(tblPr, 'bidiVisual', namespace.XPath, namespace.get)
|
||||||
|
for x in ('width', 'float', 'padding', 'shd', 'justification', 'spacing', 'indent', 'borders', 'band_size', 'look'):
|
||||||
|
f = globals()['read_%s' % x]
|
||||||
|
f(tblPr, self, self.namespace.XPath, self.namespace.get)
|
||||||
|
parent = tblPr.getparent()
|
||||||
|
if self.namespace.is_tag(parent, 'w:style'):
|
||||||
|
self.overrides = {}
|
||||||
|
for tblStylePr in self.namespace.XPath('./w:tblStylePr[@w:type]')(parent):
|
||||||
|
otype = self.namespace.get(tblStylePr, 'w:type')
|
||||||
|
orides = self.overrides[otype] = {}
|
||||||
|
for tblPr in self.namespace.XPath('./w:tblPr')(tblStylePr):
|
||||||
|
orides['table'] = TableStyle(self.namespace, tblPr)
|
||||||
|
for trPr in self.namespace.XPath('./w:trPr')(tblStylePr):
|
||||||
|
orides['row'] = RowStyle(self.namespace, trPr)
|
||||||
|
for tcPr in self.namespace.XPath('./w:tcPr')(tblStylePr):
|
||||||
|
orides['cell'] = CellStyle(self.namespace, tcPr)
|
||||||
|
for pPr in self.namespace.XPath('./w:pPr')(tblStylePr):
|
||||||
|
orides['para'] = ParagraphStyle(self.namespace, pPr)
|
||||||
|
for rPr in self.namespace.XPath('./w:rPr')(tblStylePr):
|
||||||
|
orides['run'] = RunStyle(self.namespace, rPr)
|
||||||
|
self._css = None
|
||||||
|
|
||||||
|
def resolve_based_on(self, parent):
|
||||||
|
for p in self.all_properties:
|
||||||
|
val = getattr(self, p)
|
||||||
|
if val is inherit:
|
||||||
|
setattr(self, p, getattr(parent, p))
|
||||||
|
|
||||||
|
@property
|
||||||
|
def css(self):
|
||||||
|
if self._css is None:
|
||||||
|
c = self._css = {}
|
||||||
|
if self.width not in (inherit, 'auto'):
|
||||||
|
c['width'] = self.width
|
||||||
|
for x in ('background_color', 'margin_left', 'margin_right'):
|
||||||
|
val = getattr(self, x)
|
||||||
|
if val is not inherit:
|
||||||
|
c[x.replace('_', '-')] = val
|
||||||
|
if self.indent not in (inherit, 'auto') and self.margin_left != 'auto':
|
||||||
|
c['margin-left'] = self.indent
|
||||||
|
if self.float is not inherit:
|
||||||
|
for x in ('left', 'top', 'right', 'bottom'):
|
||||||
|
val = self.float.get('%sFromText' % x, 0)
|
||||||
|
try:
|
||||||
|
val = '%.3gpt' % (int(val) / 20)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
val = '0'
|
||||||
|
c['margin-%s' % x] = val
|
||||||
|
if 'tblpXSpec' in self.float:
|
||||||
|
c['float'] = 'right' if self.float['tblpXSpec'] in {'right', 'outside'} else 'left'
|
||||||
|
else:
|
||||||
|
page = self.page
|
||||||
|
page_width = page.width - page.margin_left - page.margin_right
|
||||||
|
try:
|
||||||
|
x = int(self.float['tblpX']) / 20
|
||||||
|
except (KeyError, ValueError, TypeError):
|
||||||
|
x = 0
|
||||||
|
c['float'] = 'left' if (x/page_width) < 0.65 else 'right'
|
||||||
|
c.update(self.convert_spacing())
|
||||||
|
if 'border-collapse' not in c:
|
||||||
|
c['border-collapse'] = 'collapse'
|
||||||
|
c.update(self.convert_border())
|
||||||
|
|
||||||
|
return self._css
|
||||||
|
|
||||||
|
|
||||||
|
class Table(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace, tbl, styles, para_map, is_sub_table=False):
|
||||||
|
self.namespace = namespace
|
||||||
|
self.tbl = tbl
|
||||||
|
self.styles = styles
|
||||||
|
self.is_sub_table = is_sub_table
|
||||||
|
|
||||||
|
# Read Table Style
|
||||||
|
style = {'table':TableStyle(self.namespace)}
|
||||||
|
for tblPr in self.namespace.XPath('./w:tblPr')(tbl):
|
||||||
|
for ts in self.namespace.XPath('./w:tblStyle[@w:val]')(tblPr):
|
||||||
|
style_id = self.namespace.get(ts, 'w:val')
|
||||||
|
s = styles.get(style_id)
|
||||||
|
if s is not None:
|
||||||
|
if s.table_style is not None:
|
||||||
|
style['table'].update(s.table_style)
|
||||||
|
if s.paragraph_style is not None:
|
||||||
|
if 'paragraph' in style:
|
||||||
|
style['paragraph'].update(s.paragraph_style)
|
||||||
|
else:
|
||||||
|
style['paragraph'] = s.paragraph_style
|
||||||
|
if s.character_style is not None:
|
||||||
|
if 'run' in style:
|
||||||
|
style['run'].update(s.character_style)
|
||||||
|
else:
|
||||||
|
style['run'] = s.character_style
|
||||||
|
style['table'].update(TableStyle(self.namespace, tblPr))
|
||||||
|
self.table_style, self.paragraph_style = style['table'], style.get('paragraph', None)
|
||||||
|
self.run_style = style.get('run', None)
|
||||||
|
self.overrides = self.table_style.overrides
|
||||||
|
if self.overrides is inherit:
|
||||||
|
self.overrides = {}
|
||||||
|
if 'wholeTable' in self.overrides and 'table' in self.overrides['wholeTable']:
|
||||||
|
self.table_style.update(self.overrides['wholeTable']['table'])
|
||||||
|
|
||||||
|
self.style_map = {}
|
||||||
|
self.paragraphs = []
|
||||||
|
self.cell_map = []
|
||||||
|
|
||||||
|
rows = self.namespace.XPath('./w:tr')(tbl)
|
||||||
|
for r, tr in enumerate(rows):
|
||||||
|
overrides = self.get_overrides(r, None, len(rows), None)
|
||||||
|
self.resolve_row_style(tr, overrides)
|
||||||
|
cells = self.namespace.XPath('./w:tc')(tr)
|
||||||
|
self.cell_map.append([])
|
||||||
|
for c, tc in enumerate(cells):
|
||||||
|
overrides = self.get_overrides(r, c, len(rows), len(cells))
|
||||||
|
self.resolve_cell_style(tc, overrides, r, c, len(rows), len(cells))
|
||||||
|
self.cell_map[-1].append(tc)
|
||||||
|
for p in self.namespace.XPath('./w:p')(tc):
|
||||||
|
para_map[p] = self
|
||||||
|
self.paragraphs.append(p)
|
||||||
|
self.resolve_para_style(p, overrides)
|
||||||
|
|
||||||
|
self.handle_merged_cells()
|
||||||
|
self.sub_tables = {x:Table(namespace, x, styles, para_map, is_sub_table=True) for x in self.namespace.XPath('./w:tr/w:tc/w:tbl')(tbl)}
|
||||||
|
|
||||||
|
@property
|
||||||
|
def bidi(self):
|
||||||
|
return self.table_style.bidi is True
|
||||||
|
|
||||||
|
def override_allowed(self, name):
|
||||||
|
'Check if the named override is allowed by the tblLook element'
|
||||||
|
if name.endswith('Cell') or name == 'wholeTable':
|
||||||
|
return True
|
||||||
|
look = self.table_style.look
|
||||||
|
if (look & 0x0020 and name == 'firstRow') or (look & 0x0040 and name == 'lastRow') or \
|
||||||
|
(look & 0x0080 and name == 'firstCol') or (look & 0x0100 and name == 'lastCol'):
|
||||||
|
return True
|
||||||
|
if name.startswith('band'):
|
||||||
|
if name.endswith('Horz'):
|
||||||
|
return not bool(look & 0x0200)
|
||||||
|
if name.endswith('Vert'):
|
||||||
|
return not bool(look & 0x0400)
|
||||||
|
return False
|
||||||
|
|
||||||
|
def get_overrides(self, r, c, num_of_rows, num_of_cols_in_row):
|
||||||
|
'List of possible overrides for the given para'
|
||||||
|
overrides = ['wholeTable']
|
||||||
|
|
||||||
|
def divisor(m, n):
|
||||||
|
return (m - (m % n)) // n
|
||||||
|
if c is not None:
|
||||||
|
odd_column_band = (divisor(c, self.table_style.col_band_size) % 2) == 1
|
||||||
|
overrides.append('band%dVert' % (1 if odd_column_band else 2))
|
||||||
|
odd_row_band = (divisor(r, self.table_style.row_band_size) % 2) == 1
|
||||||
|
overrides.append('band%dHorz' % (1 if odd_row_band else 2))
|
||||||
|
|
||||||
|
# According to the OOXML spec columns should have higher override
|
||||||
|
# priority than rows, but Word seems to do it the other way around.
|
||||||
|
if c is not None:
|
||||||
|
if c == 0:
|
||||||
|
overrides.append('firstCol')
|
||||||
|
if c >= num_of_cols_in_row - 1:
|
||||||
|
overrides.append('lastCol')
|
||||||
|
if r == 0:
|
||||||
|
overrides.append('firstRow')
|
||||||
|
if r >= num_of_rows - 1:
|
||||||
|
overrides.append('lastRow')
|
||||||
|
if c is not None:
|
||||||
|
if r == 0:
|
||||||
|
if c == 0:
|
||||||
|
overrides.append('nwCell')
|
||||||
|
if c == num_of_cols_in_row - 1:
|
||||||
|
overrides.append('neCell')
|
||||||
|
if r == num_of_rows - 1:
|
||||||
|
if c == 0:
|
||||||
|
overrides.append('swCell')
|
||||||
|
if c == num_of_cols_in_row - 1:
|
||||||
|
overrides.append('seCell')
|
||||||
|
return tuple(filter(self.override_allowed, overrides))
|
||||||
|
|
||||||
|
def resolve_row_style(self, tr, overrides):
|
||||||
|
rs = RowStyle(self.namespace)
|
||||||
|
for o in overrides:
|
||||||
|
if o in self.overrides:
|
||||||
|
ovr = self.overrides[o]
|
||||||
|
ors = ovr.get('row', None)
|
||||||
|
if ors is not None:
|
||||||
|
rs.update(ors)
|
||||||
|
|
||||||
|
for trPr in self.namespace.XPath('./w:trPr')(tr):
|
||||||
|
rs.update(RowStyle(self.namespace, trPr))
|
||||||
|
if self.bidi:
|
||||||
|
rs.apply_bidi()
|
||||||
|
self.style_map[tr] = rs
|
||||||
|
|
||||||
|
def resolve_cell_style(self, tc, overrides, row, col, rows, cols_in_row):
|
||||||
|
cs = CellStyle(self.namespace)
|
||||||
|
for o in overrides:
|
||||||
|
if o in self.overrides:
|
||||||
|
ovr = self.overrides[o]
|
||||||
|
ors = ovr.get('cell', None)
|
||||||
|
if ors is not None:
|
||||||
|
cs.update(ors)
|
||||||
|
|
||||||
|
for tcPr in self.namespace.XPath('./w:tcPr')(tc):
|
||||||
|
cs.update(CellStyle(self.namespace, tcPr))
|
||||||
|
|
||||||
|
for x in edges:
|
||||||
|
p = 'cell_padding_%s' % x
|
||||||
|
val = getattr(cs, p)
|
||||||
|
if val is inherit:
|
||||||
|
setattr(cs, p, getattr(self.table_style, p))
|
||||||
|
|
||||||
|
is_inside_edge = (
|
||||||
|
(x == 'left' and col > 0) or
|
||||||
|
(x == 'top' and row > 0) or
|
||||||
|
(x == 'right' and col < cols_in_row - 1) or
|
||||||
|
(x == 'bottom' and row < rows -1)
|
||||||
|
)
|
||||||
|
inside_edge = ('insideH' if x in {'top', 'bottom'} else 'insideV') if is_inside_edge else None
|
||||||
|
for prop in border_props:
|
||||||
|
if not prop.startswith('border'):
|
||||||
|
continue
|
||||||
|
eprop = prop % x
|
||||||
|
iprop = (prop % inside_edge) if inside_edge else None
|
||||||
|
val = getattr(cs, eprop)
|
||||||
|
if val is inherit and iprop is not None:
|
||||||
|
# Use the insideX borders if the main cell borders are not
|
||||||
|
# specified
|
||||||
|
val = getattr(cs, iprop)
|
||||||
|
if val is inherit:
|
||||||
|
val = getattr(self.table_style, iprop)
|
||||||
|
if not is_inside_edge and val == 'none':
|
||||||
|
# Cell borders must override table borders even when the
|
||||||
|
# table border is not null and the cell border is null.
|
||||||
|
val = 'hidden'
|
||||||
|
setattr(cs, eprop, val)
|
||||||
|
|
||||||
|
if self.bidi:
|
||||||
|
cs.apply_bidi()
|
||||||
|
self.style_map[tc] = cs
|
||||||
|
|
||||||
|
def resolve_para_style(self, p, overrides):
|
||||||
|
text_styles = [clone(self.paragraph_style), clone(self.run_style)]
|
||||||
|
|
||||||
|
for o in overrides:
|
||||||
|
if o in self.overrides:
|
||||||
|
ovr = self.overrides[o]
|
||||||
|
for i, name in enumerate(('para', 'run')):
|
||||||
|
ops = ovr.get(name, None)
|
||||||
|
if ops is not None:
|
||||||
|
if text_styles[i] is None:
|
||||||
|
text_styles[i] = ops
|
||||||
|
else:
|
||||||
|
text_styles[i].update(ops)
|
||||||
|
self.style_map[p] = text_styles
|
||||||
|
|
||||||
|
def handle_merged_cells(self):
|
||||||
|
if not self.cell_map:
|
||||||
|
return
|
||||||
|
# Handle vMerge
|
||||||
|
max_col_num = max(len(r) for r in self.cell_map)
|
||||||
|
for c in range(max_col_num):
|
||||||
|
cells = [row[c] if c < len(row) else None for row in self.cell_map]
|
||||||
|
runs = [[]]
|
||||||
|
for cell in cells:
|
||||||
|
try:
|
||||||
|
s = self.style_map[cell]
|
||||||
|
except KeyError: # cell is None
|
||||||
|
s = CellStyle(self.namespace)
|
||||||
|
if s.vMerge == 'restart':
|
||||||
|
runs.append([cell])
|
||||||
|
elif s.vMerge == 'continue':
|
||||||
|
runs[-1].append(cell)
|
||||||
|
else:
|
||||||
|
runs.append([])
|
||||||
|
for run in runs:
|
||||||
|
if len(run) > 1:
|
||||||
|
self.style_map[run[0]].row_span = len(run)
|
||||||
|
for tc in run[1:]:
|
||||||
|
tc.getparent().remove(tc)
|
||||||
|
|
||||||
|
# Handle hMerge
|
||||||
|
for cells in self.cell_map:
|
||||||
|
runs = [[]]
|
||||||
|
for cell in cells:
|
||||||
|
try:
|
||||||
|
s = self.style_map[cell]
|
||||||
|
except KeyError: # cell is None
|
||||||
|
s = CellStyle(self.namespace)
|
||||||
|
if s.col_span is not inherit:
|
||||||
|
runs.append([])
|
||||||
|
continue
|
||||||
|
if s.hMerge == 'restart':
|
||||||
|
runs.append([cell])
|
||||||
|
elif s.hMerge == 'continue':
|
||||||
|
runs[-1].append(cell)
|
||||||
|
else:
|
||||||
|
runs.append([])
|
||||||
|
|
||||||
|
for run in runs:
|
||||||
|
if len(run) > 1:
|
||||||
|
self.style_map[run[0]].col_span = len(run)
|
||||||
|
for tc in run[1:]:
|
||||||
|
tc.getparent().remove(tc)
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
for p in self.paragraphs:
|
||||||
|
yield p
|
||||||
|
for t in itervalues(self.sub_tables):
|
||||||
|
for p in t:
|
||||||
|
yield p
|
||||||
|
|
||||||
|
def apply_markup(self, rmap, page, parent=None):
|
||||||
|
table = TABLE('\n\t\t')
|
||||||
|
if self.bidi:
|
||||||
|
table.set('dir', 'rtl')
|
||||||
|
self.table_style.page = page
|
||||||
|
style_map = {}
|
||||||
|
if parent is None:
|
||||||
|
try:
|
||||||
|
first_para = rmap[next(iter(self))]
|
||||||
|
except StopIteration:
|
||||||
|
return
|
||||||
|
parent = first_para.getparent()
|
||||||
|
idx = parent.index(first_para)
|
||||||
|
parent.insert(idx, table)
|
||||||
|
else:
|
||||||
|
parent.append(table)
|
||||||
|
for row in self.namespace.XPath('./w:tr')(self.tbl):
|
||||||
|
tr = TR('\n\t\t\t')
|
||||||
|
style_map[tr] = self.style_map[row]
|
||||||
|
tr.tail = '\n\t\t'
|
||||||
|
table.append(tr)
|
||||||
|
for tc in self.namespace.XPath('./w:tc')(row):
|
||||||
|
td = TD()
|
||||||
|
style_map[td] = s = self.style_map[tc]
|
||||||
|
if s.col_span is not inherit:
|
||||||
|
td.set('colspan', unicode_type(s.col_span))
|
||||||
|
if s.row_span is not inherit:
|
||||||
|
td.set('rowspan', unicode_type(s.row_span))
|
||||||
|
td.tail = '\n\t\t\t'
|
||||||
|
tr.append(td)
|
||||||
|
for x in self.namespace.XPath('./w:p|./w:tbl')(tc):
|
||||||
|
if x.tag.endswith('}p'):
|
||||||
|
td.append(rmap[x])
|
||||||
|
else:
|
||||||
|
self.sub_tables[x].apply_markup(rmap, page, parent=td)
|
||||||
|
if len(tr):
|
||||||
|
tr[-1].tail = '\n\t\t'
|
||||||
|
if len(table):
|
||||||
|
table[-1].tail = '\n\t'
|
||||||
|
|
||||||
|
table_style = self.table_style.css
|
||||||
|
if table_style:
|
||||||
|
table.set('class', self.styles.register(table_style, 'table'))
|
||||||
|
for elem, style in iteritems(style_map):
|
||||||
|
css = style.css
|
||||||
|
if css:
|
||||||
|
elem.set('class', self.styles.register(css, elem.tag))
|
||||||
|
|
||||||
|
|
||||||
|
class Tables(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.tables = []
|
||||||
|
self.para_map = {}
|
||||||
|
self.sub_tables = set()
|
||||||
|
self.namespace = namespace
|
||||||
|
|
||||||
|
def register(self, tbl, styles):
|
||||||
|
if tbl in self.sub_tables:
|
||||||
|
return
|
||||||
|
self.tables.append(Table(self.namespace, tbl, styles, self.para_map))
|
||||||
|
self.sub_tables |= set(self.tables[-1].sub_tables)
|
||||||
|
|
||||||
|
def apply_markup(self, object_map, page_map):
|
||||||
|
rmap = {v:k for k, v in iteritems(object_map)}
|
||||||
|
for table in self.tables:
|
||||||
|
table.apply_markup(rmap, page_map[table.tbl])
|
||||||
|
|
||||||
|
def para_style(self, p):
|
||||||
|
table = self.para_map.get(p, None)
|
||||||
|
if table is not None:
|
||||||
|
return table.style_map.get(p, (None, None))[0]
|
||||||
|
|
||||||
|
def run_style(self, p):
|
||||||
|
table = self.para_map.get(p, None)
|
||||||
|
if table is not None:
|
||||||
|
return table.style_map.get(p, (None, None))[1]
|
||||||
29
ebook_converter/ebooks/docx/theme.py
Normal file
29
ebook_converter/ebooks/docx/theme.py
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
|
||||||
|
class Theme(object):
|
||||||
|
|
||||||
|
def __init__(self, namespace):
|
||||||
|
self.major_latin_font = 'Cambria'
|
||||||
|
self.minor_latin_font = 'Calibri'
|
||||||
|
self.namespace = namespace
|
||||||
|
|
||||||
|
def __call__(self, root):
|
||||||
|
for fs in self.namespace.XPath('//a:fontScheme')(root):
|
||||||
|
for mj in self.namespace.XPath('./a:majorFont')(fs):
|
||||||
|
for l in self.namespace.XPath('./a:latin[@typeface]')(mj):
|
||||||
|
self.major_latin_font = l.get('typeface')
|
||||||
|
for mj in self.namespace.XPath('./a:minorFont')(fs):
|
||||||
|
for l in self.namespace.XPath('./a:latin[@typeface]')(mj):
|
||||||
|
self.minor_latin_font = l.get('typeface')
|
||||||
|
|
||||||
|
def resolve_font_family(self, ff):
|
||||||
|
if ff.startswith('|'):
|
||||||
|
ff = ff[1:-1]
|
||||||
|
ff = self.major_latin_font if ff.startswith('major') else self.minor_latin_font
|
||||||
|
return ff
|
||||||
839
ebook_converter/ebooks/docx/to_html.py
Normal file
839
ebook_converter/ebooks/docx/to_html.py
Normal file
@@ -0,0 +1,839 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import sys, os, re, math, errno, uuid, numbers
|
||||||
|
from collections import OrderedDict, defaultdict
|
||||||
|
|
||||||
|
from lxml import html
|
||||||
|
from lxml.html.builder import (
|
||||||
|
HTML, HEAD, TITLE, BODY, LINK, META, P, SPAN, BR, DIV, A, DT, DL, DD, H1)
|
||||||
|
|
||||||
|
from calibre import guess_type
|
||||||
|
from calibre.ebooks.docx.container import DOCX, fromstring
|
||||||
|
from calibre.ebooks.docx.names import XML, generate_anchor
|
||||||
|
from calibre.ebooks.docx.styles import Styles, inherit, PageProperties
|
||||||
|
from calibre.ebooks.docx.numbering import Numbering
|
||||||
|
from calibre.ebooks.docx.fonts import Fonts, is_symbol_font, map_symbol_text
|
||||||
|
from calibre.ebooks.docx.images import Images
|
||||||
|
from calibre.ebooks.docx.tables import Tables
|
||||||
|
from calibre.ebooks.docx.footnotes import Footnotes
|
||||||
|
from calibre.ebooks.docx.cleanup import cleanup_markup
|
||||||
|
from calibre.ebooks.docx.theme import Theme
|
||||||
|
from calibre.ebooks.docx.toc import create_toc
|
||||||
|
from calibre.ebooks.docx.fields import Fields
|
||||||
|
from calibre.ebooks.docx.settings import Settings
|
||||||
|
from calibre.ebooks.metadata.opf2 import OPFCreator
|
||||||
|
from calibre.utils.localization import canonicalize_lang, lang_as_iso639_1
|
||||||
|
from polyglot.builtins import iteritems, itervalues, filter, getcwd, map, unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
NBSP = '\xa0'
|
||||||
|
|
||||||
|
|
||||||
|
class Text:
|
||||||
|
|
||||||
|
def __init__(self, elem, attr, buf):
|
||||||
|
self.elem, self.attr, self.buf = elem, attr, buf
|
||||||
|
self.elems = [self.elem]
|
||||||
|
|
||||||
|
def add_elem(self, elem):
|
||||||
|
self.elems.append(elem)
|
||||||
|
setattr(self.elem, self.attr, ''.join(self.buf))
|
||||||
|
self.elem, self.attr, self.buf = elem, 'tail', []
|
||||||
|
|
||||||
|
def __iter__(self):
|
||||||
|
return iter(self.elems)
|
||||||
|
|
||||||
|
|
||||||
|
def html_lang(docx_lang):
|
||||||
|
lang = canonicalize_lang(docx_lang)
|
||||||
|
if lang and lang != 'und':
|
||||||
|
lang = lang_as_iso639_1(lang)
|
||||||
|
if lang:
|
||||||
|
return lang
|
||||||
|
|
||||||
|
|
||||||
|
class Convert(object):
|
||||||
|
|
||||||
|
def __init__(self, path_or_stream, dest_dir=None, log=None, detect_cover=True, notes_text=None, notes_nopb=False, nosupsub=False):
|
||||||
|
self.docx = DOCX(path_or_stream, log=log)
|
||||||
|
self.namespace = self.docx.namespace
|
||||||
|
self.ms_pat = re.compile(r'\s{2,}')
|
||||||
|
self.ws_pat = re.compile(r'[\n\r\t]')
|
||||||
|
self.log = self.docx.log
|
||||||
|
self.detect_cover = detect_cover
|
||||||
|
self.notes_text = notes_text or _('Notes')
|
||||||
|
self.notes_nopb = notes_nopb
|
||||||
|
self.nosupsub = nosupsub
|
||||||
|
self.dest_dir = dest_dir or getcwd()
|
||||||
|
self.mi = self.docx.metadata
|
||||||
|
self.body = BODY()
|
||||||
|
self.theme = Theme(self.namespace)
|
||||||
|
self.settings = Settings(self.namespace)
|
||||||
|
self.tables = Tables(self.namespace)
|
||||||
|
self.fields = Fields(self.namespace)
|
||||||
|
self.styles = Styles(self.namespace, self.tables)
|
||||||
|
self.images = Images(self.namespace, self.log)
|
||||||
|
self.object_map = OrderedDict()
|
||||||
|
self.html = HTML(
|
||||||
|
HEAD(
|
||||||
|
META(charset='utf-8'),
|
||||||
|
TITLE(self.mi.title or _('Unknown')),
|
||||||
|
LINK(rel='stylesheet', type='text/css', href='docx.css'),
|
||||||
|
),
|
||||||
|
self.body
|
||||||
|
)
|
||||||
|
self.html.text='\n\t'
|
||||||
|
self.html[0].text='\n\t\t'
|
||||||
|
self.html[0].tail='\n'
|
||||||
|
for child in self.html[0]:
|
||||||
|
child.tail = '\n\t\t'
|
||||||
|
self.html[0][-1].tail = '\n\t'
|
||||||
|
self.html[1].text = self.html[1].tail = '\n'
|
||||||
|
lang = html_lang(self.mi.language)
|
||||||
|
if lang:
|
||||||
|
self.html.set('lang', lang)
|
||||||
|
self.doc_lang = lang
|
||||||
|
else:
|
||||||
|
self.doc_lang = None
|
||||||
|
|
||||||
|
def __call__(self):
|
||||||
|
doc = self.docx.document
|
||||||
|
relationships_by_id, relationships_by_type = self.docx.document_relationships
|
||||||
|
self.resolve_alternate_content(doc)
|
||||||
|
self.fields(doc, self.log)
|
||||||
|
self.read_styles(relationships_by_type)
|
||||||
|
self.images(relationships_by_id)
|
||||||
|
self.layers = OrderedDict()
|
||||||
|
self.framed = [[]]
|
||||||
|
self.frame_map = {}
|
||||||
|
self.framed_map = {}
|
||||||
|
self.anchor_map = {}
|
||||||
|
self.link_map = defaultdict(list)
|
||||||
|
self.link_source_map = {}
|
||||||
|
self.toc_anchor = None
|
||||||
|
self.block_runs = []
|
||||||
|
paras = []
|
||||||
|
|
||||||
|
self.log.debug('Converting Word markup to HTML')
|
||||||
|
|
||||||
|
self.read_page_properties(doc)
|
||||||
|
self.current_rels = relationships_by_id
|
||||||
|
for wp, page_properties in iteritems(self.page_map):
|
||||||
|
self.current_page = page_properties
|
||||||
|
if wp.tag.endswith('}p'):
|
||||||
|
p = self.convert_p(wp)
|
||||||
|
self.body.append(p)
|
||||||
|
paras.append(wp)
|
||||||
|
|
||||||
|
self.read_block_anchors(doc)
|
||||||
|
self.styles.apply_contextual_spacing(paras)
|
||||||
|
self.mark_block_runs(paras)
|
||||||
|
# Apply page breaks at the start of every section, except the first
|
||||||
|
# section (since that will be the start of the file)
|
||||||
|
self.styles.apply_section_page_breaks(self.section_starts[1:])
|
||||||
|
|
||||||
|
notes_header = None
|
||||||
|
orig_rid_map = self.images.rid_map
|
||||||
|
if self.footnotes.has_notes:
|
||||||
|
self.body.append(H1(self.notes_text))
|
||||||
|
notes_header = self.body[-1]
|
||||||
|
notes_header.set('class', 'notes-header')
|
||||||
|
for anchor, text, note in self.footnotes:
|
||||||
|
dl = DL(id=anchor)
|
||||||
|
dl.set('class', 'footnote')
|
||||||
|
self.body.append(dl)
|
||||||
|
dl.append(DT('[', A('←' + text, href='#back_%s' % anchor, title=text)))
|
||||||
|
dl[-1][0].tail = ']'
|
||||||
|
dl.append(DD())
|
||||||
|
paras = []
|
||||||
|
self.images.rid_map = self.current_rels = note.rels[0]
|
||||||
|
for wp in note:
|
||||||
|
if wp.tag.endswith('}tbl'):
|
||||||
|
self.tables.register(wp, self.styles)
|
||||||
|
self.page_map[wp] = self.current_page
|
||||||
|
else:
|
||||||
|
p = self.convert_p(wp)
|
||||||
|
dl[-1].append(p)
|
||||||
|
paras.append(wp)
|
||||||
|
self.styles.apply_contextual_spacing(paras)
|
||||||
|
self.mark_block_runs(paras)
|
||||||
|
|
||||||
|
for p, wp in iteritems(self.object_map):
|
||||||
|
if len(p) > 0 and not p.text and len(p[0]) > 0 and not p[0].text and p[0][0].get('class', None) == 'tab':
|
||||||
|
# Paragraph uses tabs for indentation, convert to text-indent
|
||||||
|
parent = p[0]
|
||||||
|
tabs = []
|
||||||
|
for child in parent:
|
||||||
|
if child.get('class', None) == 'tab':
|
||||||
|
tabs.append(child)
|
||||||
|
if child.tail:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
break
|
||||||
|
indent = len(tabs) * self.settings.default_tab_stop
|
||||||
|
style = self.styles.resolve(wp)
|
||||||
|
if style.text_indent is inherit or (hasattr(style.text_indent, 'endswith') and style.text_indent.endswith('pt')):
|
||||||
|
if style.text_indent is not inherit:
|
||||||
|
indent = float(style.text_indent[:-2]) + indent
|
||||||
|
style.text_indent = '%.3gpt' % indent
|
||||||
|
parent.text = tabs[-1].tail or ''
|
||||||
|
list(map(parent.remove, tabs))
|
||||||
|
|
||||||
|
self.images.rid_map = orig_rid_map
|
||||||
|
|
||||||
|
self.resolve_links()
|
||||||
|
|
||||||
|
self.styles.cascade(self.layers)
|
||||||
|
|
||||||
|
self.tables.apply_markup(self.object_map, self.page_map)
|
||||||
|
|
||||||
|
numbered = []
|
||||||
|
for html_obj, obj in iteritems(self.object_map):
|
||||||
|
raw = obj.get('calibre_num_id', None)
|
||||||
|
if raw is not None:
|
||||||
|
lvl, num_id = raw.partition(':')[0::2]
|
||||||
|
try:
|
||||||
|
lvl = int(lvl)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
lvl = 0
|
||||||
|
numbered.append((html_obj, num_id, lvl))
|
||||||
|
self.numbering.apply_markup(numbered, self.body, self.styles, self.object_map, self.images)
|
||||||
|
self.apply_frames()
|
||||||
|
|
||||||
|
if len(self.body) > 0:
|
||||||
|
self.body.text = '\n\t'
|
||||||
|
for child in self.body:
|
||||||
|
child.tail = '\n\t'
|
||||||
|
self.body[-1].tail = '\n'
|
||||||
|
|
||||||
|
self.log.debug('Converting styles to CSS')
|
||||||
|
self.styles.generate_classes()
|
||||||
|
for html_obj, obj in iteritems(self.object_map):
|
||||||
|
style = self.styles.resolve(obj)
|
||||||
|
if style is not None:
|
||||||
|
css = style.css
|
||||||
|
if css:
|
||||||
|
cls = self.styles.class_name(css)
|
||||||
|
if cls:
|
||||||
|
html_obj.set('class', cls)
|
||||||
|
for html_obj, css in iteritems(self.framed_map):
|
||||||
|
cls = self.styles.class_name(css)
|
||||||
|
if cls:
|
||||||
|
html_obj.set('class', cls)
|
||||||
|
|
||||||
|
if notes_header is not None:
|
||||||
|
for h in self.namespace.children(self.body, 'h1', 'h2', 'h3'):
|
||||||
|
notes_header.tag = h.tag
|
||||||
|
cls = h.get('class', None)
|
||||||
|
if cls and cls != 'notes-header':
|
||||||
|
notes_header.set('class', '%s notes-header' % cls)
|
||||||
|
break
|
||||||
|
|
||||||
|
self.fields.polish_markup(self.object_map)
|
||||||
|
|
||||||
|
self.log.debug('Cleaning up redundant markup generated by Word')
|
||||||
|
self.cover_image = cleanup_markup(self.log, self.html, self.styles, self.dest_dir, self.detect_cover, self.namespace.XPath)
|
||||||
|
|
||||||
|
return self.write(doc)
|
||||||
|
|
||||||
|
def read_page_properties(self, doc):
|
||||||
|
current = []
|
||||||
|
self.page_map = OrderedDict()
|
||||||
|
self.section_starts = []
|
||||||
|
|
||||||
|
for p in self.namespace.descendants(doc, 'w:p', 'w:tbl'):
|
||||||
|
if p.tag.endswith('}tbl'):
|
||||||
|
self.tables.register(p, self.styles)
|
||||||
|
current.append(p)
|
||||||
|
continue
|
||||||
|
sect = tuple(self.namespace.descendants(p, 'w:sectPr'))
|
||||||
|
if sect:
|
||||||
|
pr = PageProperties(self.namespace, sect)
|
||||||
|
paras = current + [p]
|
||||||
|
for x in paras:
|
||||||
|
self.page_map[x] = pr
|
||||||
|
self.section_starts.append(paras[0])
|
||||||
|
current = []
|
||||||
|
else:
|
||||||
|
current.append(p)
|
||||||
|
|
||||||
|
if current:
|
||||||
|
self.section_starts.append(current[0])
|
||||||
|
last = self.namespace.XPath('./w:body/w:sectPr')(doc)
|
||||||
|
pr = PageProperties(self.namespace, last)
|
||||||
|
for x in current:
|
||||||
|
self.page_map[x] = pr
|
||||||
|
|
||||||
|
def resolve_alternate_content(self, doc):
|
||||||
|
# For proprietary extensions in Word documents use the fallback, spec
|
||||||
|
# compliant form
|
||||||
|
# See https://wiki.openoffice.org/wiki/OOXML/Markup_Compatibility_and_Extensibility
|
||||||
|
for ac in self.namespace.descendants(doc, 'mc:AlternateContent'):
|
||||||
|
choices = self.namespace.XPath('./mc:Choice')(ac)
|
||||||
|
fallbacks = self.namespace.XPath('./mc:Fallback')(ac)
|
||||||
|
if fallbacks:
|
||||||
|
for choice in choices:
|
||||||
|
ac.remove(choice)
|
||||||
|
|
||||||
|
def read_styles(self, relationships_by_type):
|
||||||
|
|
||||||
|
def get_name(rtype, defname):
|
||||||
|
name = relationships_by_type.get(rtype, None)
|
||||||
|
if name is None:
|
||||||
|
cname = self.docx.document_name.split('/')
|
||||||
|
cname[-1] = defname
|
||||||
|
if self.docx.exists('/'.join(cname)):
|
||||||
|
name = name
|
||||||
|
if name and name.startswith('word/word') and not self.docx.exists(name):
|
||||||
|
name = name.partition('/')[2]
|
||||||
|
return name
|
||||||
|
|
||||||
|
nname = get_name(self.namespace.names['NUMBERING'], 'numbering.xml')
|
||||||
|
sname = get_name(self.namespace.names['STYLES'], 'styles.xml')
|
||||||
|
sename = get_name(self.namespace.names['SETTINGS'], 'settings.xml')
|
||||||
|
fname = get_name(self.namespace.names['FONTS'], 'fontTable.xml')
|
||||||
|
tname = get_name(self.namespace.names['THEMES'], 'theme1.xml')
|
||||||
|
foname = get_name(self.namespace.names['FOOTNOTES'], 'footnotes.xml')
|
||||||
|
enname = get_name(self.namespace.names['ENDNOTES'], 'endnotes.xml')
|
||||||
|
numbering = self.numbering = Numbering(self.namespace)
|
||||||
|
footnotes = self.footnotes = Footnotes(self.namespace)
|
||||||
|
fonts = self.fonts = Fonts(self.namespace)
|
||||||
|
|
||||||
|
foraw = enraw = None
|
||||||
|
forel, enrel = ({}, {}), ({}, {})
|
||||||
|
if sename is not None:
|
||||||
|
try:
|
||||||
|
seraw = self.docx.read(sename)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Settings %s do not exist' % sename)
|
||||||
|
except EnvironmentError as e:
|
||||||
|
if e.errno != errno.ENOENT:
|
||||||
|
raise
|
||||||
|
self.log.warn('Settings %s file missing' % sename)
|
||||||
|
else:
|
||||||
|
self.settings(fromstring(seraw))
|
||||||
|
|
||||||
|
if foname is not None:
|
||||||
|
try:
|
||||||
|
foraw = self.docx.read(foname)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Footnotes %s do not exist' % foname)
|
||||||
|
else:
|
||||||
|
forel = self.docx.get_relationships(foname)
|
||||||
|
if enname is not None:
|
||||||
|
try:
|
||||||
|
enraw = self.docx.read(enname)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Endnotes %s do not exist' % enname)
|
||||||
|
else:
|
||||||
|
enrel = self.docx.get_relationships(enname)
|
||||||
|
footnotes(fromstring(foraw) if foraw else None, forel, fromstring(enraw) if enraw else None, enrel)
|
||||||
|
|
||||||
|
if fname is not None:
|
||||||
|
embed_relationships = self.docx.get_relationships(fname)[0]
|
||||||
|
try:
|
||||||
|
raw = self.docx.read(fname)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Fonts table %s does not exist' % fname)
|
||||||
|
else:
|
||||||
|
fonts(fromstring(raw), embed_relationships, self.docx, self.dest_dir)
|
||||||
|
|
||||||
|
if tname is not None:
|
||||||
|
try:
|
||||||
|
raw = self.docx.read(tname)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Styles %s do not exist' % sname)
|
||||||
|
else:
|
||||||
|
self.theme(fromstring(raw))
|
||||||
|
|
||||||
|
styles_loaded = False
|
||||||
|
if sname is not None:
|
||||||
|
try:
|
||||||
|
raw = self.docx.read(sname)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Styles %s do not exist' % sname)
|
||||||
|
else:
|
||||||
|
self.styles(fromstring(raw), fonts, self.theme)
|
||||||
|
styles_loaded = True
|
||||||
|
if not styles_loaded:
|
||||||
|
self.styles(None, fonts, self.theme)
|
||||||
|
|
||||||
|
if nname is not None:
|
||||||
|
try:
|
||||||
|
raw = self.docx.read(nname)
|
||||||
|
except KeyError:
|
||||||
|
self.log.warn('Numbering styles %s do not exist' % nname)
|
||||||
|
else:
|
||||||
|
numbering(fromstring(raw), self.styles, self.docx.get_relationships(nname)[0])
|
||||||
|
|
||||||
|
self.styles.resolve_numbering(numbering)
|
||||||
|
|
||||||
|
def write(self, doc):
|
||||||
|
toc = create_toc(doc, self.body, self.resolved_link_map, self.styles, self.object_map, self.log, self.namespace)
|
||||||
|
raw = html.tostring(self.html, encoding='utf-8', doctype='<!DOCTYPE html>')
|
||||||
|
with lopen(os.path.join(self.dest_dir, 'index.html'), 'wb') as f:
|
||||||
|
f.write(raw)
|
||||||
|
css = self.styles.generate_css(self.dest_dir, self.docx, self.notes_nopb, self.nosupsub)
|
||||||
|
if css:
|
||||||
|
with lopen(os.path.join(self.dest_dir, 'docx.css'), 'wb') as f:
|
||||||
|
f.write(css.encode('utf-8'))
|
||||||
|
|
||||||
|
opf = OPFCreator(self.dest_dir, self.mi)
|
||||||
|
opf.toc = toc
|
||||||
|
opf.create_manifest_from_files_in([self.dest_dir])
|
||||||
|
for item in opf.manifest:
|
||||||
|
if item.media_type == 'text/html':
|
||||||
|
item.media_type = guess_type('a.xhtml')[0]
|
||||||
|
opf.create_spine(['index.html'])
|
||||||
|
if self.cover_image is not None:
|
||||||
|
opf.guide.set_cover(self.cover_image)
|
||||||
|
|
||||||
|
def process_guide(E, guide):
|
||||||
|
if self.toc_anchor is not None:
|
||||||
|
guide.append(E.reference(
|
||||||
|
href='index.html#' + self.toc_anchor, title=_('Table of Contents'), type='toc'))
|
||||||
|
toc_file = os.path.join(self.dest_dir, 'toc.ncx')
|
||||||
|
with lopen(os.path.join(self.dest_dir, 'metadata.opf'), 'wb') as of, open(toc_file, 'wb') as ncx:
|
||||||
|
opf.render(of, ncx, 'toc.ncx', process_guide=process_guide)
|
||||||
|
if os.path.getsize(toc_file) == 0:
|
||||||
|
os.remove(toc_file)
|
||||||
|
return os.path.join(self.dest_dir, 'metadata.opf')
|
||||||
|
|
||||||
|
def read_block_anchors(self, doc):
|
||||||
|
doc_anchors = frozenset(self.namespace.XPath('./w:body/w:bookmarkStart[@w:name]')(doc))
|
||||||
|
if doc_anchors:
|
||||||
|
current_bm = set()
|
||||||
|
rmap = {v:k for k, v in iteritems(self.object_map)}
|
||||||
|
for p in self.namespace.descendants(doc, 'w:p', 'w:bookmarkStart[@w:name]'):
|
||||||
|
if p.tag.endswith('}p'):
|
||||||
|
if current_bm and p in rmap:
|
||||||
|
para = rmap[p]
|
||||||
|
if 'id' not in para.attrib:
|
||||||
|
para.set('id', generate_anchor(next(iter(current_bm)), frozenset(itervalues(self.anchor_map))))
|
||||||
|
for name in current_bm:
|
||||||
|
self.anchor_map[name] = para.get('id')
|
||||||
|
current_bm = set()
|
||||||
|
elif p in doc_anchors:
|
||||||
|
anchor = self.namespace.get(p, 'w:name')
|
||||||
|
if anchor:
|
||||||
|
current_bm.add(anchor)
|
||||||
|
|
||||||
|
def convert_p(self, p):
|
||||||
|
dest = P()
|
||||||
|
self.object_map[dest] = p
|
||||||
|
style = self.styles.resolve_paragraph(p)
|
||||||
|
self.layers[p] = []
|
||||||
|
self.frame_map[p] = style.frame
|
||||||
|
self.add_frame(dest, style.frame)
|
||||||
|
|
||||||
|
current_anchor = None
|
||||||
|
current_hyperlink = None
|
||||||
|
hl_xpath = self.namespace.XPath('ancestor::w:hyperlink[1]')
|
||||||
|
|
||||||
|
def p_parent(x):
|
||||||
|
# Ensure that nested <w:p> tags are handled. These can occur if a
|
||||||
|
# textbox is present inside a paragraph.
|
||||||
|
while True:
|
||||||
|
x = x.getparent()
|
||||||
|
try:
|
||||||
|
if x.tag.endswith('}p'):
|
||||||
|
return x
|
||||||
|
except AttributeError:
|
||||||
|
break
|
||||||
|
|
||||||
|
for x in self.namespace.descendants(p, 'w:r', 'w:bookmarkStart', 'w:hyperlink', 'w:instrText'):
|
||||||
|
if p_parent(x) is not p:
|
||||||
|
continue
|
||||||
|
if x.tag.endswith('}r'):
|
||||||
|
span = self.convert_run(x)
|
||||||
|
if current_anchor is not None:
|
||||||
|
(dest if len(dest) == 0 else span).set('id', current_anchor)
|
||||||
|
current_anchor = None
|
||||||
|
if current_hyperlink is not None:
|
||||||
|
try:
|
||||||
|
hl = hl_xpath(x)[0]
|
||||||
|
self.link_map[hl].append(span)
|
||||||
|
self.link_source_map[hl] = self.current_rels
|
||||||
|
x.set('is-link', '1')
|
||||||
|
except IndexError:
|
||||||
|
current_hyperlink = None
|
||||||
|
dest.append(span)
|
||||||
|
self.layers[p].append(x)
|
||||||
|
elif x.tag.endswith('}bookmarkStart'):
|
||||||
|
anchor = self.namespace.get(x, 'w:name')
|
||||||
|
if anchor and anchor not in self.anchor_map and anchor != '_GoBack':
|
||||||
|
# _GoBack is a special bookmark inserted by Word 2010 for
|
||||||
|
# the return to previous edit feature, we ignore it
|
||||||
|
old_anchor = current_anchor
|
||||||
|
self.anchor_map[anchor] = current_anchor = generate_anchor(anchor, frozenset(itervalues(self.anchor_map)))
|
||||||
|
if old_anchor is not None:
|
||||||
|
# The previous anchor was not applied to any element
|
||||||
|
for a, t in tuple(iteritems(self.anchor_map)):
|
||||||
|
if t == old_anchor:
|
||||||
|
self.anchor_map[a] = current_anchor
|
||||||
|
elif x.tag.endswith('}hyperlink'):
|
||||||
|
current_hyperlink = x
|
||||||
|
elif x.tag.endswith('}instrText') and x.text and x.text.strip().startswith('TOC '):
|
||||||
|
old_anchor = current_anchor
|
||||||
|
anchor = unicode_type(uuid.uuid4())
|
||||||
|
self.anchor_map[anchor] = current_anchor = generate_anchor('toc', frozenset(itervalues(self.anchor_map)))
|
||||||
|
self.toc_anchor = current_anchor
|
||||||
|
if old_anchor is not None:
|
||||||
|
# The previous anchor was not applied to any element
|
||||||
|
for a, t in tuple(iteritems(self.anchor_map)):
|
||||||
|
if t == old_anchor:
|
||||||
|
self.anchor_map[a] = current_anchor
|
||||||
|
if current_anchor is not None:
|
||||||
|
# This paragraph had no <w:r> descendants
|
||||||
|
dest.set('id', current_anchor)
|
||||||
|
current_anchor = None
|
||||||
|
|
||||||
|
m = re.match(r'heading\s+(\d+)$', style.style_name or '', re.IGNORECASE)
|
||||||
|
if m is not None:
|
||||||
|
n = min(6, max(1, int(m.group(1))))
|
||||||
|
dest.tag = 'h%d' % n
|
||||||
|
dest.set('data-heading-level', unicode_type(n))
|
||||||
|
|
||||||
|
if style.bidi is True:
|
||||||
|
dest.set('dir', 'rtl')
|
||||||
|
|
||||||
|
border_runs = []
|
||||||
|
common_borders = []
|
||||||
|
for span in dest:
|
||||||
|
run = self.object_map[span]
|
||||||
|
style = self.styles.resolve_run(run)
|
||||||
|
if not border_runs or border_runs[-1][1].same_border(style):
|
||||||
|
border_runs.append((span, style))
|
||||||
|
elif border_runs:
|
||||||
|
if len(border_runs) > 1:
|
||||||
|
common_borders.append(border_runs)
|
||||||
|
border_runs = []
|
||||||
|
|
||||||
|
for border_run in common_borders:
|
||||||
|
spans = []
|
||||||
|
bs = {}
|
||||||
|
for span, style in border_run:
|
||||||
|
style.get_border_css(bs)
|
||||||
|
style.clear_border_css()
|
||||||
|
spans.append(span)
|
||||||
|
if bs:
|
||||||
|
cls = self.styles.register(bs, 'text_border')
|
||||||
|
wrapper = self.wrap_elems(spans, SPAN())
|
||||||
|
wrapper.set('class', cls)
|
||||||
|
|
||||||
|
if not dest.text and len(dest) == 0 and not style.has_visible_border():
|
||||||
|
# Empty paragraph add a non-breaking space so that it is rendered
|
||||||
|
# by WebKit
|
||||||
|
dest.text = NBSP
|
||||||
|
|
||||||
|
# If the last element in a block is a <br> the <br> is not rendered in
|
||||||
|
# HTML, unless it is followed by a trailing space. Word, on the other
|
||||||
|
# hand inserts a blank line for trailing <br>s.
|
||||||
|
if len(dest) > 0 and not dest[-1].tail:
|
||||||
|
if dest[-1].tag == 'br':
|
||||||
|
dest[-1].tail = NBSP
|
||||||
|
elif len(dest[-1]) > 0 and dest[-1][-1].tag == 'br' and not dest[-1][-1].tail:
|
||||||
|
dest[-1][-1].tail = NBSP
|
||||||
|
|
||||||
|
return dest
|
||||||
|
|
||||||
|
def wrap_elems(self, elems, wrapper):
|
||||||
|
p = elems[0].getparent()
|
||||||
|
idx = p.index(elems[0])
|
||||||
|
p.insert(idx, wrapper)
|
||||||
|
wrapper.tail = elems[-1].tail
|
||||||
|
elems[-1].tail = None
|
||||||
|
for elem in elems:
|
||||||
|
try:
|
||||||
|
p.remove(elem)
|
||||||
|
except ValueError:
|
||||||
|
# Probably a hyperlink that spans multiple
|
||||||
|
# paragraphs,theoretically we should break this up into
|
||||||
|
# multiple hyperlinks, but I can't be bothered.
|
||||||
|
elem.getparent().remove(elem)
|
||||||
|
wrapper.append(elem)
|
||||||
|
return wrapper
|
||||||
|
|
||||||
|
def resolve_links(self):
|
||||||
|
self.resolved_link_map = {}
|
||||||
|
for hyperlink, spans in iteritems(self.link_map):
|
||||||
|
relationships_by_id = self.link_source_map[hyperlink]
|
||||||
|
span = spans[0]
|
||||||
|
if len(spans) > 1:
|
||||||
|
span = self.wrap_elems(spans, SPAN())
|
||||||
|
span.tag = 'a'
|
||||||
|
self.resolved_link_map[hyperlink] = span
|
||||||
|
tgt = self.namespace.get(hyperlink, 'w:tgtFrame')
|
||||||
|
if tgt:
|
||||||
|
span.set('target', tgt)
|
||||||
|
tt = self.namespace.get(hyperlink, 'w:tooltip')
|
||||||
|
if tt:
|
||||||
|
span.set('title', tt)
|
||||||
|
rid = self.namespace.get(hyperlink, 'r:id')
|
||||||
|
if rid and rid in relationships_by_id:
|
||||||
|
span.set('href', relationships_by_id[rid])
|
||||||
|
continue
|
||||||
|
anchor = self.namespace.get(hyperlink, 'w:anchor')
|
||||||
|
if anchor and anchor in self.anchor_map:
|
||||||
|
span.set('href', '#' + self.anchor_map[anchor])
|
||||||
|
continue
|
||||||
|
self.log.warn('Hyperlink with unknown target (rid=%s, anchor=%s), ignoring' %
|
||||||
|
(rid, anchor))
|
||||||
|
# hrefs that point nowhere give epubcheck a hernia. The element
|
||||||
|
# should be styled explicitly by Word anyway.
|
||||||
|
# span.set('href', '#')
|
||||||
|
rmap = {v:k for k, v in iteritems(self.object_map)}
|
||||||
|
for hyperlink, runs in self.fields.hyperlink_fields:
|
||||||
|
spans = [rmap[r] for r in runs if r in rmap]
|
||||||
|
if not spans:
|
||||||
|
continue
|
||||||
|
span = spans[0]
|
||||||
|
if len(spans) > 1:
|
||||||
|
span = self.wrap_elems(spans, SPAN())
|
||||||
|
span.tag = 'a'
|
||||||
|
tgt = hyperlink.get('target', None)
|
||||||
|
if tgt:
|
||||||
|
span.set('target', tgt)
|
||||||
|
tt = hyperlink.get('title', None)
|
||||||
|
if tt:
|
||||||
|
span.set('title', tt)
|
||||||
|
url = hyperlink.get('url', None)
|
||||||
|
if url is None:
|
||||||
|
anchor = hyperlink.get('anchor', None)
|
||||||
|
if anchor in self.anchor_map:
|
||||||
|
span.set('href', '#' + self.anchor_map[anchor])
|
||||||
|
continue
|
||||||
|
self.log.warn('Hyperlink field with unknown anchor: %s' % anchor)
|
||||||
|
else:
|
||||||
|
if url in self.anchor_map:
|
||||||
|
span.set('href', '#' + self.anchor_map[url])
|
||||||
|
continue
|
||||||
|
span.set('href', url)
|
||||||
|
|
||||||
|
for img, link, relationships_by_id in self.images.links:
|
||||||
|
parent = img.getparent()
|
||||||
|
idx = parent.index(img)
|
||||||
|
a = A(img)
|
||||||
|
a.tail, img.tail = img.tail, None
|
||||||
|
parent.insert(idx, a)
|
||||||
|
tgt = link.get('target', None)
|
||||||
|
if tgt:
|
||||||
|
a.set('target', tgt)
|
||||||
|
tt = link.get('title', None)
|
||||||
|
if tt:
|
||||||
|
a.set('title', tt)
|
||||||
|
rid = link['id']
|
||||||
|
if rid in relationships_by_id:
|
||||||
|
dest = relationships_by_id[rid]
|
||||||
|
if dest.startswith('#'):
|
||||||
|
if dest[1:] in self.anchor_map:
|
||||||
|
a.set('href', '#' + self.anchor_map[dest[1:]])
|
||||||
|
else:
|
||||||
|
a.set('href', dest)
|
||||||
|
|
||||||
|
def convert_run(self, run):
|
||||||
|
ans = SPAN()
|
||||||
|
self.object_map[ans] = run
|
||||||
|
text = Text(ans, 'text', [])
|
||||||
|
|
||||||
|
for child in run:
|
||||||
|
if self.namespace.is_tag(child, 'w:t'):
|
||||||
|
if not child.text:
|
||||||
|
continue
|
||||||
|
space = child.get(XML('space'), None)
|
||||||
|
preserve = False
|
||||||
|
ctext = child.text
|
||||||
|
if space != 'preserve':
|
||||||
|
# Remove leading and trailing whitespace. Word ignores
|
||||||
|
# leading and trailing whitespace without preserve
|
||||||
|
ctext = ctext.strip(' \n\r\t')
|
||||||
|
# Only use a <span> with white-space:pre-wrap if this element
|
||||||
|
# actually needs it, i.e. if it has more than one
|
||||||
|
# consecutive space or it has newlines or tabs.
|
||||||
|
multi_spaces = self.ms_pat.search(ctext) is not None
|
||||||
|
preserve = multi_spaces or self.ws_pat.search(ctext) is not None
|
||||||
|
if preserve:
|
||||||
|
text.add_elem(SPAN(ctext, style="white-space:pre-wrap"))
|
||||||
|
ans.append(text.elem)
|
||||||
|
else:
|
||||||
|
text.buf.append(ctext)
|
||||||
|
elif self.namespace.is_tag(child, 'w:cr'):
|
||||||
|
text.add_elem(BR())
|
||||||
|
ans.append(text.elem)
|
||||||
|
elif self.namespace.is_tag(child, 'w:br'):
|
||||||
|
typ = self.namespace.get(child, 'w:type')
|
||||||
|
if typ in {'column', 'page'}:
|
||||||
|
br = BR(style='page-break-after:always')
|
||||||
|
else:
|
||||||
|
clear = child.get('clear', None)
|
||||||
|
if clear in {'all', 'left', 'right'}:
|
||||||
|
br = BR(style='clear:%s'%('both' if clear == 'all' else clear))
|
||||||
|
else:
|
||||||
|
br = BR()
|
||||||
|
text.add_elem(br)
|
||||||
|
ans.append(text.elem)
|
||||||
|
elif self.namespace.is_tag(child, 'w:drawing') or self.namespace.is_tag(child, 'w:pict'):
|
||||||
|
for img in self.images.to_html(child, self.current_page, self.docx, self.dest_dir):
|
||||||
|
text.add_elem(img)
|
||||||
|
ans.append(text.elem)
|
||||||
|
elif self.namespace.is_tag(child, 'w:footnoteReference') or self.namespace.is_tag(child, 'w:endnoteReference'):
|
||||||
|
anchor, name = self.footnotes.get_ref(child)
|
||||||
|
if anchor and name:
|
||||||
|
l = A(name, id='back_%s' % anchor, href='#' + anchor, title=name)
|
||||||
|
l.set('class', 'noteref')
|
||||||
|
text.add_elem(l)
|
||||||
|
ans.append(text.elem)
|
||||||
|
elif self.namespace.is_tag(child, 'w:tab'):
|
||||||
|
spaces = int(math.ceil((self.settings.default_tab_stop / 36) * 6))
|
||||||
|
text.add_elem(SPAN(NBSP * spaces))
|
||||||
|
ans.append(text.elem)
|
||||||
|
ans[-1].set('class', 'tab')
|
||||||
|
elif self.namespace.is_tag(child, 'w:noBreakHyphen'):
|
||||||
|
text.buf.append('\u2011')
|
||||||
|
elif self.namespace.is_tag(child, 'w:softHyphen'):
|
||||||
|
text.buf.append('\u00ad')
|
||||||
|
if text.buf:
|
||||||
|
setattr(text.elem, text.attr, ''.join(text.buf))
|
||||||
|
|
||||||
|
style = self.styles.resolve_run(run)
|
||||||
|
if style.vert_align in {'superscript', 'subscript'}:
|
||||||
|
if ans.text or len(ans):
|
||||||
|
ans.set('data-docx-vert', 'sup' if style.vert_align == 'superscript' else 'sub')
|
||||||
|
if style.lang is not inherit:
|
||||||
|
lang = html_lang(style.lang)
|
||||||
|
if lang is not None and lang != self.doc_lang:
|
||||||
|
ans.set('lang', lang)
|
||||||
|
if style.rtl is True:
|
||||||
|
ans.set('dir', 'rtl')
|
||||||
|
if is_symbol_font(style.font_family):
|
||||||
|
for elem in text:
|
||||||
|
if elem.text:
|
||||||
|
elem.text = map_symbol_text(elem.text, style.font_family)
|
||||||
|
if elem.tail:
|
||||||
|
elem.tail = map_symbol_text(elem.tail, style.font_family)
|
||||||
|
style.font_family = 'sans-serif'
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def add_frame(self, html_obj, style):
|
||||||
|
last_run = self.framed[-1]
|
||||||
|
if style is inherit:
|
||||||
|
if last_run:
|
||||||
|
self.framed.append([])
|
||||||
|
return
|
||||||
|
|
||||||
|
if last_run:
|
||||||
|
if last_run[-1][1] == style:
|
||||||
|
last_run.append((html_obj, style))
|
||||||
|
else:
|
||||||
|
self.framed[-1].append((html_obj, style))
|
||||||
|
else:
|
||||||
|
last_run.append((html_obj, style))
|
||||||
|
|
||||||
|
def apply_frames(self):
|
||||||
|
for run in filter(None, self.framed):
|
||||||
|
style = run[0][1]
|
||||||
|
paras = tuple(x[0] for x in run)
|
||||||
|
parent = paras[0].getparent()
|
||||||
|
idx = parent.index(paras[0])
|
||||||
|
frame = DIV(*paras)
|
||||||
|
parent.insert(idx, frame)
|
||||||
|
self.framed_map[frame] = css = style.css(self.page_map[self.object_map[paras[0]]])
|
||||||
|
self.styles.register(css, 'frame')
|
||||||
|
|
||||||
|
if not self.block_runs:
|
||||||
|
return
|
||||||
|
rmap = {v:k for k, v in iteritems(self.object_map)}
|
||||||
|
for border_style, blocks in self.block_runs:
|
||||||
|
paras = tuple(rmap[p] for p in blocks)
|
||||||
|
for p in paras:
|
||||||
|
if p.tag == 'li':
|
||||||
|
has_li = True
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
has_li = False
|
||||||
|
parent = paras[0].getparent()
|
||||||
|
if parent.tag in ('ul', 'ol'):
|
||||||
|
ul = parent
|
||||||
|
parent = ul.getparent()
|
||||||
|
idx = parent.index(ul)
|
||||||
|
frame = DIV(ul)
|
||||||
|
elif has_li:
|
||||||
|
def top_level_tag(x):
|
||||||
|
while True:
|
||||||
|
q = x.getparent()
|
||||||
|
if q is parent or q is None:
|
||||||
|
break
|
||||||
|
x = q
|
||||||
|
return x
|
||||||
|
paras = tuple(map(top_level_tag, paras))
|
||||||
|
idx = parent.index(paras[0])
|
||||||
|
frame = DIV(*paras)
|
||||||
|
else:
|
||||||
|
idx = parent.index(paras[0])
|
||||||
|
frame = DIV(*paras)
|
||||||
|
parent.insert(idx, frame)
|
||||||
|
self.framed_map[frame] = css = border_style.css
|
||||||
|
self.styles.register(css, 'frame')
|
||||||
|
|
||||||
|
def mark_block_runs(self, paras):
|
||||||
|
|
||||||
|
def process_run(run):
|
||||||
|
max_left = max_right = 0
|
||||||
|
has_visible_border = None
|
||||||
|
for p in run:
|
||||||
|
style = self.styles.resolve_paragraph(p)
|
||||||
|
if has_visible_border is None:
|
||||||
|
has_visible_border = style.has_visible_border()
|
||||||
|
if isinstance(style.margin_left, numbers.Number):
|
||||||
|
max_left = max(style.margin_left, max_left)
|
||||||
|
if isinstance(style.margin_right, numbers.Number):
|
||||||
|
max_right = max(style.margin_right, max_right)
|
||||||
|
if has_visible_border:
|
||||||
|
style.margin_left = style.margin_right = inherit
|
||||||
|
if p is not run[0]:
|
||||||
|
style.padding_top = 0
|
||||||
|
else:
|
||||||
|
border_style = style.clone_border_styles()
|
||||||
|
if has_visible_border:
|
||||||
|
border_style.margin_top, style.margin_top = style.margin_top, inherit
|
||||||
|
if p is not run[-1]:
|
||||||
|
style.padding_bottom = 0
|
||||||
|
else:
|
||||||
|
if has_visible_border:
|
||||||
|
border_style.margin_bottom, style.margin_bottom = style.margin_bottom, inherit
|
||||||
|
style.clear_borders()
|
||||||
|
if p is not run[-1]:
|
||||||
|
style.apply_between_border()
|
||||||
|
if has_visible_border:
|
||||||
|
border_style.margin_left, border_style.margin_right = max_left,max_right
|
||||||
|
self.block_runs.append((border_style, run))
|
||||||
|
|
||||||
|
run = []
|
||||||
|
for p in paras:
|
||||||
|
if run and self.frame_map.get(p) == self.frame_map.get(run[-1]):
|
||||||
|
style = self.styles.resolve_paragraph(p)
|
||||||
|
last_style = self.styles.resolve_paragraph(run[-1])
|
||||||
|
if style.has_identical_borders(last_style):
|
||||||
|
run.append(p)
|
||||||
|
continue
|
||||||
|
if len(run) > 1:
|
||||||
|
process_run(run)
|
||||||
|
run = [p]
|
||||||
|
if len(run) > 1:
|
||||||
|
process_run(run)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
import shutil
|
||||||
|
from calibre.utils.logging import default_log
|
||||||
|
default_log.filter_level = default_log.DEBUG
|
||||||
|
dest_dir = os.path.join(getcwd(), 'docx_input')
|
||||||
|
if os.path.exists(dest_dir):
|
||||||
|
shutil.rmtree(dest_dir)
|
||||||
|
os.mkdir(dest_dir)
|
||||||
|
Convert(sys.argv[-1], dest_dir=dest_dir, log=default_log)()
|
||||||
143
ebook_converter/ebooks/docx/toc.py
Normal file
143
ebook_converter/ebooks/docx/toc.py
Normal file
@@ -0,0 +1,143 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=utf-8
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2013, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from collections import namedtuple
|
||||||
|
from itertools import count
|
||||||
|
|
||||||
|
from lxml.etree import tostring
|
||||||
|
|
||||||
|
from calibre.ebooks.metadata.toc import TOC
|
||||||
|
from calibre.ebooks.oeb.polish.toc import elem_to_toc_text
|
||||||
|
from polyglot.builtins import iteritems, range
|
||||||
|
|
||||||
|
|
||||||
|
def from_headings(body, log, namespace, num_levels=3):
|
||||||
|
' Create a TOC from headings in the document '
|
||||||
|
tocroot = TOC()
|
||||||
|
all_heading_nodes = body.xpath('//*[@data-heading-level]')
|
||||||
|
level_prev = {i+1:None for i in range(num_levels)}
|
||||||
|
level_prev[0] = tocroot
|
||||||
|
level_item_map = {i:frozenset(
|
||||||
|
x for x in all_heading_nodes if int(x.get('data-heading-level')) == i)
|
||||||
|
for i in range(1, num_levels+1)}
|
||||||
|
item_level_map = {e:i for i, elems in iteritems(level_item_map) for e in elems}
|
||||||
|
|
||||||
|
idcount = count()
|
||||||
|
|
||||||
|
def ensure_id(elem):
|
||||||
|
ans = elem.get('id', None)
|
||||||
|
if not ans:
|
||||||
|
ans = 'toc_id_%d' % (next(idcount) + 1)
|
||||||
|
elem.set('id', ans)
|
||||||
|
return ans
|
||||||
|
|
||||||
|
for item in all_heading_nodes:
|
||||||
|
lvl = plvl = item_level_map.get(item, None)
|
||||||
|
if lvl is None:
|
||||||
|
continue
|
||||||
|
parent = None
|
||||||
|
while parent is None:
|
||||||
|
plvl -= 1
|
||||||
|
parent = level_prev[plvl]
|
||||||
|
lvl = plvl + 1
|
||||||
|
elem_id = ensure_id(item)
|
||||||
|
text = elem_to_toc_text(item)
|
||||||
|
toc = parent.add_item('index.html', elem_id, text)
|
||||||
|
level_prev[lvl] = toc
|
||||||
|
for i in range(lvl+1, num_levels+1):
|
||||||
|
level_prev[i] = None
|
||||||
|
|
||||||
|
if len(tuple(tocroot.flat())) > 1:
|
||||||
|
log('Generating Table of Contents from headings')
|
||||||
|
return tocroot
|
||||||
|
|
||||||
|
|
||||||
|
def structure_toc(entries):
|
||||||
|
indent_vals = sorted({x.indent for x in entries})
|
||||||
|
last_found = [None for i in indent_vals]
|
||||||
|
newtoc = TOC()
|
||||||
|
|
||||||
|
if len(indent_vals) > 6:
|
||||||
|
for x in entries:
|
||||||
|
newtoc.add_item('index.html', x.anchor, x.text)
|
||||||
|
return newtoc
|
||||||
|
|
||||||
|
def find_parent(level):
|
||||||
|
candidates = last_found[:level]
|
||||||
|
for x in reversed(candidates):
|
||||||
|
if x is not None:
|
||||||
|
return x
|
||||||
|
return newtoc
|
||||||
|
|
||||||
|
for item in entries:
|
||||||
|
level = indent_vals.index(item.indent)
|
||||||
|
parent = find_parent(level)
|
||||||
|
last_found[level] = parent.add_item('index.html', item.anchor,
|
||||||
|
item.text)
|
||||||
|
for i in range(level+1, len(last_found)):
|
||||||
|
last_found[i] = None
|
||||||
|
|
||||||
|
return newtoc
|
||||||
|
|
||||||
|
|
||||||
|
def link_to_txt(a, styles, object_map):
|
||||||
|
if len(a) > 1:
|
||||||
|
for child in a:
|
||||||
|
run = object_map.get(child, None)
|
||||||
|
if run is not None:
|
||||||
|
rs = styles.resolve(run)
|
||||||
|
if rs.css.get('display', None) == 'none':
|
||||||
|
a.remove(child)
|
||||||
|
|
||||||
|
return tostring(a, method='text', with_tail=False, encoding='unicode').strip()
|
||||||
|
|
||||||
|
|
||||||
|
def from_toc(docx, link_map, styles, object_map, log, namespace):
|
||||||
|
XPath, get, ancestor = namespace.XPath, namespace.get, namespace.ancestor
|
||||||
|
toc_level = None
|
||||||
|
level = 0
|
||||||
|
TI = namedtuple('TI', 'text anchor indent')
|
||||||
|
toc = []
|
||||||
|
for tag in XPath('//*[(@w:fldCharType and name()="w:fldChar") or name()="w:hyperlink" or name()="w:instrText"]')(docx):
|
||||||
|
n = tag.tag.rpartition('}')[-1]
|
||||||
|
if n == 'fldChar':
|
||||||
|
t = get(tag, 'w:fldCharType')
|
||||||
|
if t == 'begin':
|
||||||
|
level += 1
|
||||||
|
elif t == 'end':
|
||||||
|
level -= 1
|
||||||
|
if toc_level is not None and level < toc_level:
|
||||||
|
break
|
||||||
|
elif n == 'instrText':
|
||||||
|
if level > 0 and tag.text and tag.text.strip().startswith('TOC '):
|
||||||
|
toc_level = level
|
||||||
|
elif n == 'hyperlink':
|
||||||
|
if toc_level is not None and level >= toc_level and tag in link_map:
|
||||||
|
a = link_map[tag]
|
||||||
|
href = a.get('href', None)
|
||||||
|
txt = link_to_txt(a, styles, object_map)
|
||||||
|
p = ancestor(tag, 'w:p')
|
||||||
|
if txt and href and p is not None:
|
||||||
|
ps = styles.resolve_paragraph(p)
|
||||||
|
try:
|
||||||
|
ml = int(ps.margin_left[:-2])
|
||||||
|
except (TypeError, ValueError, AttributeError):
|
||||||
|
ml = 0
|
||||||
|
if ps.text_align in {'center', 'right'}:
|
||||||
|
ml = 0
|
||||||
|
toc.append(TI(txt, href[1:], ml))
|
||||||
|
if toc:
|
||||||
|
log('Found Word Table of Contents, using it to generate the Table of Contents')
|
||||||
|
return structure_toc(toc)
|
||||||
|
|
||||||
|
|
||||||
|
def create_toc(docx, body, link_map, styles, object_map, log, namespace):
|
||||||
|
ans = from_toc(docx, link_map, styles, object_map, log, namespace) or from_headings(body, log, namespace)
|
||||||
|
# Remove heading level attributes
|
||||||
|
for h in body.xpath('//*[@data-heading-level]'):
|
||||||
|
del h.attrib['data-heading-level']
|
||||||
|
return ans
|
||||||
7
ebook_converter/ebooks/html/__init__.py
Normal file
7
ebook_converter/ebooks/html/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
258
ebook_converter/ebooks/html/input.py
Normal file
258
ebook_converter/ebooks/html/input.py
Normal file
@@ -0,0 +1,258 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
|
||||||
|
'''
|
||||||
|
Input plugin for HTML or OPF ebooks.
|
||||||
|
'''
|
||||||
|
|
||||||
|
import os, re, sys, errno as gerrno
|
||||||
|
|
||||||
|
from calibre.ebooks.oeb.base import urlunquote
|
||||||
|
from calibre.ebooks.chardet import detect_xml_encoding
|
||||||
|
from calibre.constants import iswindows
|
||||||
|
from calibre import unicode_path, as_unicode, replace_entities
|
||||||
|
from polyglot.builtins import is_py3, unicode_type
|
||||||
|
from polyglot.urllib import urlparse, urlunparse
|
||||||
|
|
||||||
|
|
||||||
|
class Link(object):
|
||||||
|
|
||||||
|
'''
|
||||||
|
Represents a link in a HTML file.
|
||||||
|
'''
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def url_to_local_path(cls, url, base):
|
||||||
|
path = url.path
|
||||||
|
isabs = False
|
||||||
|
if iswindows and path.startswith('/'):
|
||||||
|
path = path[1:]
|
||||||
|
isabs = True
|
||||||
|
path = urlunparse(('', '', path, url.params, url.query, ''))
|
||||||
|
path = urlunquote(path)
|
||||||
|
if isabs or os.path.isabs(path):
|
||||||
|
return path
|
||||||
|
return os.path.abspath(os.path.join(base, path))
|
||||||
|
|
||||||
|
def __init__(self, url, base):
|
||||||
|
'''
|
||||||
|
:param url: The url this link points to. Must be an unquoted unicode string.
|
||||||
|
:param base: The base directory that relative URLs are with respect to.
|
||||||
|
Must be a unicode string.
|
||||||
|
'''
|
||||||
|
assert isinstance(url, unicode_type) and isinstance(base, unicode_type)
|
||||||
|
self.url = url
|
||||||
|
self.parsed_url = urlparse(self.url)
|
||||||
|
self.is_local = self.parsed_url.scheme in ('', 'file')
|
||||||
|
self.is_internal = self.is_local and not bool(self.parsed_url.path)
|
||||||
|
self.path = None
|
||||||
|
self.fragment = urlunquote(self.parsed_url.fragment)
|
||||||
|
if self.is_local and not self.is_internal:
|
||||||
|
self.path = self.url_to_local_path(self.parsed_url, base)
|
||||||
|
|
||||||
|
def __hash__(self):
|
||||||
|
if self.path is None:
|
||||||
|
return hash(self.url)
|
||||||
|
return hash(self.path)
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
return self.path == getattr(other, 'path', other)
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
return 'Link: %s --> %s'%(self.url, self.path)
|
||||||
|
|
||||||
|
if not is_py3:
|
||||||
|
__unicode__ = __str__
|
||||||
|
|
||||||
|
|
||||||
|
class IgnoreFile(Exception):
|
||||||
|
|
||||||
|
def __init__(self, msg, errno):
|
||||||
|
Exception.__init__(self, msg)
|
||||||
|
self.doesnt_exist = errno == gerrno.ENOENT
|
||||||
|
self.errno = errno
|
||||||
|
|
||||||
|
|
||||||
|
class HTMLFile(object):
|
||||||
|
|
||||||
|
'''
|
||||||
|
Contains basic information about an HTML file. This
|
||||||
|
includes a list of links to other files as well as
|
||||||
|
the encoding of each file. Also tries to detect if the file is not a HTML
|
||||||
|
file in which case :member:`is_binary` is set to True.
|
||||||
|
|
||||||
|
The encoding of the file is available as :member:`encoding`.
|
||||||
|
'''
|
||||||
|
|
||||||
|
HTML_PAT = re.compile(r'<\s*html', re.IGNORECASE)
|
||||||
|
TITLE_PAT = re.compile('<title>([^<>]+)</title>', re.IGNORECASE)
|
||||||
|
LINK_PAT = re.compile(
|
||||||
|
r'<\s*a\s+.*?href\s*=\s*(?:(?:"(?P<url1>[^"]+)")|(?:\'(?P<url2>[^\']+)\')|(?P<url3>[^\s>]+))',
|
||||||
|
re.DOTALL|re.IGNORECASE)
|
||||||
|
|
||||||
|
def __init__(self, path_to_html_file, level, encoding, verbose, referrer=None):
|
||||||
|
'''
|
||||||
|
:param level: The level of this file. Should be 0 for the root file.
|
||||||
|
:param encoding: Use `encoding` to decode HTML.
|
||||||
|
:param referrer: The :class:`HTMLFile` that first refers to this file.
|
||||||
|
'''
|
||||||
|
self.path = unicode_path(path_to_html_file, abs=True)
|
||||||
|
self.title = os.path.splitext(os.path.basename(self.path))[0]
|
||||||
|
self.base = os.path.dirname(self.path)
|
||||||
|
self.level = level
|
||||||
|
self.referrer = referrer
|
||||||
|
self.links = []
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(self.path, 'rb') as f:
|
||||||
|
src = header = f.read(4096)
|
||||||
|
encoding = detect_xml_encoding(src)[1]
|
||||||
|
if encoding:
|
||||||
|
try:
|
||||||
|
header = header.decode(encoding)
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
self.is_binary = level > 0 and not bool(self.HTML_PAT.search(header))
|
||||||
|
if not self.is_binary:
|
||||||
|
src += f.read()
|
||||||
|
except IOError as err:
|
||||||
|
msg = 'Could not read from file: %s with error: %s'%(self.path, as_unicode(err))
|
||||||
|
if level == 0:
|
||||||
|
raise IOError(msg)
|
||||||
|
raise IgnoreFile(msg, err.errno)
|
||||||
|
|
||||||
|
if not src:
|
||||||
|
if level == 0:
|
||||||
|
raise ValueError('The file %s is empty'%self.path)
|
||||||
|
self.is_binary = True
|
||||||
|
|
||||||
|
if not self.is_binary:
|
||||||
|
if not encoding:
|
||||||
|
encoding = detect_xml_encoding(src[:4096], verbose=verbose)[1]
|
||||||
|
self.encoding = encoding
|
||||||
|
else:
|
||||||
|
self.encoding = encoding
|
||||||
|
|
||||||
|
src = src.decode(encoding, 'replace')
|
||||||
|
match = self.TITLE_PAT.search(src)
|
||||||
|
self.title = match.group(1) if match is not None else self.title
|
||||||
|
self.find_links(src)
|
||||||
|
|
||||||
|
def __eq__(self, other):
|
||||||
|
return self.path == getattr(other, 'path', other)
|
||||||
|
|
||||||
|
def __hash__(self):
|
||||||
|
return hash(self.path)
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
return 'HTMLFile:%d:%s:%s'%(self.level, 'b' if self.is_binary else 'a', self.path)
|
||||||
|
|
||||||
|
def __repr__(self):
|
||||||
|
return unicode_type(self)
|
||||||
|
|
||||||
|
def find_links(self, src):
|
||||||
|
for match in self.LINK_PAT.finditer(src):
|
||||||
|
url = None
|
||||||
|
for i in ('url1', 'url2', 'url3'):
|
||||||
|
url = match.group(i)
|
||||||
|
if url:
|
||||||
|
break
|
||||||
|
url = replace_entities(url)
|
||||||
|
try:
|
||||||
|
link = self.resolve(url)
|
||||||
|
except ValueError:
|
||||||
|
# Unparseable URL, ignore
|
||||||
|
continue
|
||||||
|
if link not in self.links:
|
||||||
|
self.links.append(link)
|
||||||
|
|
||||||
|
def resolve(self, url):
|
||||||
|
return Link(url, self.base)
|
||||||
|
|
||||||
|
|
||||||
|
def depth_first(root, flat, visited=None):
|
||||||
|
yield root
|
||||||
|
if visited is None:
|
||||||
|
visited = set()
|
||||||
|
visited.add(root)
|
||||||
|
for link in root.links:
|
||||||
|
if link.path is not None and link not in visited:
|
||||||
|
try:
|
||||||
|
index = flat.index(link)
|
||||||
|
except ValueError: # Can happen if max_levels is used
|
||||||
|
continue
|
||||||
|
hf = flat[index]
|
||||||
|
if hf not in visited:
|
||||||
|
yield hf
|
||||||
|
visited.add(hf)
|
||||||
|
for hf in depth_first(hf, flat, visited):
|
||||||
|
if hf not in visited:
|
||||||
|
yield hf
|
||||||
|
visited.add(hf)
|
||||||
|
|
||||||
|
|
||||||
|
def traverse(path_to_html_file, max_levels=sys.maxsize, verbose=0, encoding=None):
|
||||||
|
'''
|
||||||
|
Recursively traverse all links in the HTML file.
|
||||||
|
|
||||||
|
:param max_levels: Maximum levels of recursion. Must be non-negative. 0
|
||||||
|
implies that no links in the root HTML file are followed.
|
||||||
|
:param encoding: Specify character encoding of HTML files. If `None` it is
|
||||||
|
auto-detected.
|
||||||
|
:return: A pair of lists (breadth_first, depth_first). Each list contains
|
||||||
|
:class:`HTMLFile` objects.
|
||||||
|
'''
|
||||||
|
assert max_levels >= 0
|
||||||
|
level = 0
|
||||||
|
flat = [HTMLFile(path_to_html_file, level, encoding, verbose)]
|
||||||
|
next_level = list(flat)
|
||||||
|
while level < max_levels and len(next_level) > 0:
|
||||||
|
level += 1
|
||||||
|
nl = []
|
||||||
|
for hf in next_level:
|
||||||
|
rejects = []
|
||||||
|
for link in hf.links:
|
||||||
|
if link.path is None or link.path in flat:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
nf = HTMLFile(link.path, level, encoding, verbose, referrer=hf)
|
||||||
|
if nf.is_binary:
|
||||||
|
raise IgnoreFile('%s is a binary file'%nf.path, -1)
|
||||||
|
nl.append(nf)
|
||||||
|
flat.append(nf)
|
||||||
|
except IgnoreFile as err:
|
||||||
|
rejects.append(link)
|
||||||
|
if not err.doesnt_exist or verbose > 1:
|
||||||
|
print(repr(err))
|
||||||
|
for link in rejects:
|
||||||
|
hf.links.remove(link)
|
||||||
|
|
||||||
|
next_level = list(nl)
|
||||||
|
orec = sys.getrecursionlimit()
|
||||||
|
sys.setrecursionlimit(500000)
|
||||||
|
try:
|
||||||
|
return flat, list(depth_first(flat[0], flat))
|
||||||
|
finally:
|
||||||
|
sys.setrecursionlimit(orec)
|
||||||
|
|
||||||
|
|
||||||
|
def get_filelist(htmlfile, dir, opts, log):
|
||||||
|
'''
|
||||||
|
Build list of files referenced by html file or try to detect and use an
|
||||||
|
OPF file instead.
|
||||||
|
'''
|
||||||
|
log.info('Building file list...')
|
||||||
|
filelist = traverse(htmlfile, max_levels=int(opts.max_levels),
|
||||||
|
verbose=opts.verbose,
|
||||||
|
encoding=opts.input_encoding)[0 if opts.breadth_first else 1]
|
||||||
|
if opts.verbose:
|
||||||
|
log.debug('\tFound files...')
|
||||||
|
for f in filelist:
|
||||||
|
log.debug('\t\t', f)
|
||||||
|
return filelist
|
||||||
122
ebook_converter/ebooks/html/to_zip.py
Normal file
122
ebook_converter/ebooks/html/to_zip.py
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
#!/usr/bin/env python2
|
||||||
|
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
|
||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2011, Kovid Goyal <kovid@kovidgoyal.net>'
|
||||||
|
__docformat__ = 'restructuredtext en'
|
||||||
|
|
||||||
|
import textwrap, os, glob
|
||||||
|
|
||||||
|
from calibre.customize import FileTypePlugin
|
||||||
|
from calibre.constants import numeric_version
|
||||||
|
from polyglot.builtins import unicode_type
|
||||||
|
|
||||||
|
|
||||||
|
class HTML2ZIP(FileTypePlugin):
|
||||||
|
name = 'HTML to ZIP'
|
||||||
|
author = 'Kovid Goyal'
|
||||||
|
description = textwrap.dedent(_('''\
|
||||||
|
Follow all local links in an HTML file and create a ZIP \
|
||||||
|
file containing all linked files. This plugin is run \
|
||||||
|
every time you add an HTML file to the library.\
|
||||||
|
'''))
|
||||||
|
version = numeric_version
|
||||||
|
file_types = {'html', 'htm', 'xhtml', 'xhtm', 'shtm', 'shtml'}
|
||||||
|
supported_platforms = ['windows', 'osx', 'linux']
|
||||||
|
on_import = True
|
||||||
|
|
||||||
|
def run(self, htmlfile):
|
||||||
|
import codecs
|
||||||
|
from calibre import prints
|
||||||
|
from calibre.ptempfile import TemporaryDirectory
|
||||||
|
from calibre.gui2.convert.gui_conversion import gui_convert
|
||||||
|
from calibre.customize.conversion import OptionRecommendation
|
||||||
|
from calibre.ebooks.epub import initialize_container
|
||||||
|
|
||||||
|
with TemporaryDirectory('_plugin_html2zip') as tdir:
|
||||||
|
recs =[('debug_pipeline', tdir, OptionRecommendation.HIGH)]
|
||||||
|
recs.append(['keep_ligatures', True, OptionRecommendation.HIGH])
|
||||||
|
if self.site_customization and self.site_customization.strip():
|
||||||
|
sc = self.site_customization.strip()
|
||||||
|
enc, _, bf = sc.partition('|')
|
||||||
|
if enc:
|
||||||
|
try:
|
||||||
|
codecs.lookup(enc)
|
||||||
|
except Exception:
|
||||||
|
prints('Ignoring invalid input encoding for HTML:', enc)
|
||||||
|
else:
|
||||||
|
recs.append(['input_encoding', enc, OptionRecommendation.HIGH])
|
||||||
|
if bf == 'bf':
|
||||||
|
recs.append(['breadth_first', True,
|
||||||
|
OptionRecommendation.HIGH])
|
||||||
|
gui_convert(htmlfile, tdir, recs, abort_after_input_dump=True)
|
||||||
|
of = self.temporary_file('_plugin_html2zip.zip')
|
||||||
|
tdir = os.path.join(tdir, 'input')
|
||||||
|
opf = glob.glob(os.path.join(tdir, '*.opf'))[0]
|
||||||
|
ncx = glob.glob(os.path.join(tdir, '*.ncx'))
|
||||||
|
if ncx:
|
||||||
|
os.remove(ncx[0])
|
||||||
|
epub = initialize_container(of.name, os.path.basename(opf))
|
||||||
|
epub.add_dir(tdir)
|
||||||
|
epub.close()
|
||||||
|
|
||||||
|
return of.name
|
||||||
|
|
||||||
|
def customization_help(self, gui=False):
|
||||||
|
return _('Character encoding for the input HTML files. Common choices '
|
||||||
|
'include: cp1252, cp1251, latin1 and utf-8.')
|
||||||
|
|
||||||
|
def do_user_config(self, parent=None):
|
||||||
|
'''
|
||||||
|
This method shows a configuration dialog for this plugin. It returns
|
||||||
|
True if the user clicks OK, False otherwise. The changes are
|
||||||
|
automatically applied.
|
||||||
|
'''
|
||||||
|
from PyQt5.Qt import (QDialog, QDialogButtonBox, QVBoxLayout,
|
||||||
|
QLabel, Qt, QLineEdit, QCheckBox)
|
||||||
|
|
||||||
|
config_dialog = QDialog(parent)
|
||||||
|
button_box = QDialogButtonBox(QDialogButtonBox.Ok | QDialogButtonBox.Cancel)
|
||||||
|
v = QVBoxLayout(config_dialog)
|
||||||
|
|
||||||
|
def size_dialog():
|
||||||
|
config_dialog.resize(config_dialog.sizeHint())
|
||||||
|
|
||||||
|
button_box.accepted.connect(config_dialog.accept)
|
||||||
|
button_box.rejected.connect(config_dialog.reject)
|
||||||
|
config_dialog.setWindowTitle(_('Customize') + ' ' + self.name)
|
||||||
|
from calibre.customize.ui import (plugin_customization,
|
||||||
|
customize_plugin)
|
||||||
|
help_text = self.customization_help(gui=True)
|
||||||
|
help_text = QLabel(help_text, config_dialog)
|
||||||
|
help_text.setWordWrap(True)
|
||||||
|
help_text.setTextInteractionFlags(Qt.LinksAccessibleByMouse | Qt.LinksAccessibleByKeyboard)
|
||||||
|
help_text.setOpenExternalLinks(True)
|
||||||
|
v.addWidget(help_text)
|
||||||
|
bf = QCheckBox(_('Add linked files in breadth first order'))
|
||||||
|
bf.setToolTip(_('Normally, when following links in HTML files'
|
||||||
|
' calibre does it depth first, i.e. if file A links to B and '
|
||||||
|
' C, but B links to D, the files are added in the order A, B, D, C. '
|
||||||
|
' With this option, they will instead be added as A, B, C, D'))
|
||||||
|
sc = plugin_customization(self)
|
||||||
|
if not sc:
|
||||||
|
sc = ''
|
||||||
|
sc = sc.strip()
|
||||||
|
enc = sc.partition('|')[0]
|
||||||
|
bfs = sc.partition('|')[-1]
|
||||||
|
bf.setChecked(bfs == 'bf')
|
||||||
|
sc = QLineEdit(enc, config_dialog)
|
||||||
|
v.addWidget(sc)
|
||||||
|
v.addWidget(bf)
|
||||||
|
v.addWidget(button_box)
|
||||||
|
size_dialog()
|
||||||
|
config_dialog.exec_()
|
||||||
|
|
||||||
|
if config_dialog.result() == QDialog.Accepted:
|
||||||
|
sc = unicode_type(sc.text()).strip()
|
||||||
|
if bf.isChecked():
|
||||||
|
sc += '|bf'
|
||||||
|
customize_plugin(self, sc)
|
||||||
|
|
||||||
|
return config_dialog.result()
|
||||||
2152
ebook_converter/ebooks/html_entities.py
Normal file
2152
ebook_converter/ebooks/html_entities.py
Normal file
File diff suppressed because it is too large
Load Diff
115
ebook_converter/ebooks/lrf/__init__.py
Normal file
115
ebook_converter/ebooks/lrf/__init__.py
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
"""
|
||||||
|
This package contains logic to read and write LRF files.
|
||||||
|
The LRF file format is documented at U{http://www.sven.de/librie/Librie/LrfFormat}.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from calibre.ebooks.lrf.pylrs.pylrs import Book as _Book
|
||||||
|
from calibre.ebooks.lrf.pylrs.pylrs import TextBlock, Header, \
|
||||||
|
TextStyle, BlockStyle
|
||||||
|
from calibre.ebooks.lrf.fonts import FONT_FILE_MAP
|
||||||
|
from calibre.ebooks import ConversionError
|
||||||
|
|
||||||
|
__docformat__ = "epytext"
|
||||||
|
|
||||||
|
|
||||||
|
class LRFParseError(Exception):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class PRS500_PROFILE(object):
|
||||||
|
screen_width = 600
|
||||||
|
screen_height = 775
|
||||||
|
dpi = 166
|
||||||
|
# Number of pixels to subtract from screen_height when calculating height of text area
|
||||||
|
fudge = 0
|
||||||
|
font_size = 10 #: Default (in pt)
|
||||||
|
parindent = 10 #: Default (in pt)
|
||||||
|
line_space = 1.2 # : Default (in pt)
|
||||||
|
header_font_size = 6 #: In pt
|
||||||
|
header_height = 30 # : In px
|
||||||
|
default_fonts = {'sans': "Swis721 BT Roman", 'mono': "Courier10 BT Roman",
|
||||||
|
'serif': "Dutch801 Rm BT Roman"}
|
||||||
|
|
||||||
|
name = 'prs500'
|
||||||
|
|
||||||
|
|
||||||
|
def find_custom_fonts(options, logger):
|
||||||
|
from calibre.utils.fonts.scanner import font_scanner
|
||||||
|
fonts = {'serif' : None, 'sans' : None, 'mono' : None}
|
||||||
|
|
||||||
|
def family(cmd):
|
||||||
|
return cmd.split(',')[-1].strip()
|
||||||
|
if options.serif_family:
|
||||||
|
f = family(options.serif_family)
|
||||||
|
fonts['serif'] = font_scanner.legacy_fonts_for_family(f)
|
||||||
|
if not fonts['serif']:
|
||||||
|
logger.warn('Unable to find serif family %s'%f)
|
||||||
|
if options.sans_family:
|
||||||
|
f = family(options.sans_family)
|
||||||
|
fonts['sans'] = font_scanner.legacy_fonts_for_family(f)
|
||||||
|
if not fonts['sans']:
|
||||||
|
logger.warn('Unable to find sans family %s'%f)
|
||||||
|
if options.mono_family:
|
||||||
|
f = family(options.mono_family)
|
||||||
|
fonts['mono'] = font_scanner.legacy_fonts_for_family(f)
|
||||||
|
if not fonts['mono']:
|
||||||
|
logger.warn('Unable to find mono family %s'%f)
|
||||||
|
return fonts
|
||||||
|
|
||||||
|
|
||||||
|
def Book(options, logger, font_delta=0, header=None,
|
||||||
|
profile=PRS500_PROFILE, **settings):
|
||||||
|
from uuid import uuid4
|
||||||
|
ps = {}
|
||||||
|
ps['topmargin'] = options.top_margin
|
||||||
|
ps['evensidemargin'] = options.left_margin
|
||||||
|
ps['oddsidemargin'] = options.left_margin
|
||||||
|
ps['textwidth'] = profile.screen_width - (options.left_margin + options.right_margin)
|
||||||
|
ps['textheight'] = profile.screen_height - (options.top_margin + options.bottom_margin) \
|
||||||
|
- profile.fudge
|
||||||
|
if header:
|
||||||
|
hdr = Header()
|
||||||
|
hb = TextBlock(textStyle=TextStyle(align='foot',
|
||||||
|
fontsize=int(profile.header_font_size*10)),
|
||||||
|
blockStyle=BlockStyle(blockwidth=ps['textwidth']))
|
||||||
|
hb.append(header)
|
||||||
|
hdr.PutObj(hb)
|
||||||
|
ps['headheight'] = profile.header_height
|
||||||
|
ps['headsep'] = options.header_separation
|
||||||
|
ps['header'] = hdr
|
||||||
|
ps['topmargin'] = 0
|
||||||
|
ps['textheight'] = profile.screen_height - (options.bottom_margin + ps['topmargin']) \
|
||||||
|
- ps['headheight'] - ps['headsep'] - profile.fudge
|
||||||
|
|
||||||
|
fontsize = int(10*profile.font_size+font_delta*20)
|
||||||
|
baselineskip = fontsize + 20
|
||||||
|
fonts = find_custom_fonts(options, logger)
|
||||||
|
tsd = dict(fontsize=fontsize,
|
||||||
|
parindent=int(10*profile.parindent),
|
||||||
|
linespace=int(10*profile.line_space),
|
||||||
|
baselineskip=baselineskip,
|
||||||
|
wordspace=10*options.wordspace)
|
||||||
|
if fonts['serif'] and 'normal' in fonts['serif']:
|
||||||
|
tsd['fontfacename'] = fonts['serif']['normal'][1]
|
||||||
|
|
||||||
|
book = _Book(textstyledefault=tsd,
|
||||||
|
pagestyledefault=ps,
|
||||||
|
blockstyledefault=dict(blockwidth=ps['textwidth']),
|
||||||
|
bookid=uuid4().hex,
|
||||||
|
**settings)
|
||||||
|
for family in fonts.keys():
|
||||||
|
if fonts[family]:
|
||||||
|
for font in fonts[family].values():
|
||||||
|
book.embed_font(*font)
|
||||||
|
FONT_FILE_MAP[font[1]] = font[0]
|
||||||
|
|
||||||
|
for family in ['serif', 'sans', 'mono']:
|
||||||
|
if not fonts[family]:
|
||||||
|
fonts[family] = {'normal' : (None, profile.default_fonts[family])}
|
||||||
|
elif 'normal' not in fonts[family]:
|
||||||
|
raise ConversionError('Could not find the normal version of the ' + family + ' font')
|
||||||
|
return book, fonts
|
||||||
33
ebook_converter/ebooks/lrf/fonts.py
Normal file
33
ebook_converter/ebooks/lrf/fonts.py
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
from PIL import ImageFont
|
||||||
|
|
||||||
|
'''
|
||||||
|
Default fonts used in the PRS500
|
||||||
|
'''
|
||||||
|
|
||||||
|
|
||||||
|
LIBERATION_FONT_MAP = {
|
||||||
|
'Swis721 BT Roman' : 'LiberationSans-Regular',
|
||||||
|
'Dutch801 Rm BT Roman' : 'LiberationSerif-Regular',
|
||||||
|
'Courier10 BT Roman' : 'LiberationMono-Regular',
|
||||||
|
}
|
||||||
|
|
||||||
|
FONT_FILE_MAP = {}
|
||||||
|
|
||||||
|
|
||||||
|
def get_font(name, size, encoding='unic'):
|
||||||
|
'''
|
||||||
|
Get an ImageFont object by name.
|
||||||
|
@param size: Font height in pixels. To convert from pts:
|
||||||
|
sz in pixels = (dpi/72) * size in pts
|
||||||
|
@param encoding: Font encoding to use. E.g. 'unic', 'symbol', 'ADOB', 'ADBE', 'aprm'
|
||||||
|
@param manager: A dict that will store the PersistentTemporary
|
||||||
|
'''
|
||||||
|
if name in LIBERATION_FONT_MAP:
|
||||||
|
return ImageFont.truetype(P('fonts/liberation/%s.ttf' % LIBERATION_FONT_MAP[name]), size, encoding=encoding)
|
||||||
|
elif name in FONT_FILE_MAP:
|
||||||
|
return ImageFont.truetype(FONT_FILE_MAP[name], size, encoding=encoding)
|
||||||
10
ebook_converter/ebooks/lrf/html/__init__.py
Normal file
10
ebook_converter/ebooks/lrf/html/__init__.py
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
"""
|
||||||
|
This package contains code to convert HTML ebooks to LRF ebooks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
__docformat__ = "epytext"
|
||||||
|
__author__ = "Kovid Goyal <kovid@kovidgoyal.net>"
|
||||||
115
ebook_converter/ebooks/lrf/html/color_map.py
Normal file
115
ebook_converter/ebooks/lrf/html/color_map.py
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
NAME_MAP = {
|
||||||
|
'aliceblue': '#F0F8FF',
|
||||||
|
'antiquewhite': '#FAEBD7',
|
||||||
|
'aqua': '#00FFFF',
|
||||||
|
'aquamarine': '#7FFFD4',
|
||||||
|
'azure': '#F0FFFF',
|
||||||
|
'beige': '#F5F5DC',
|
||||||
|
'bisque': '#FFE4C4',
|
||||||
|
'black': '#000000',
|
||||||
|
'blanchedalmond': '#FFEBCD',
|
||||||
|
'blue': '#0000FF',
|
||||||
|
'brown': '#A52A2A',
|
||||||
|
'burlywood': '#DEB887',
|
||||||
|
'cadetblue': '#5F9EA0',
|
||||||
|
'chartreuse': '#7FFF00',
|
||||||
|
'chocolate': '#D2691E',
|
||||||
|
'coral': '#FF7F50',
|
||||||
|
'crimson': '#DC143C',
|
||||||
|
'cyan': '#00FFFF',
|
||||||
|
'darkblue': '#00008B',
|
||||||
|
'darkgoldenrod': '#B8860B',
|
||||||
|
'darkgreen': '#006400',
|
||||||
|
'darkkhaki': '#BDB76B',
|
||||||
|
'darkmagenta': '#8B008B',
|
||||||
|
'darkolivegreen': '#556B2F',
|
||||||
|
'darkorange': '#FF8C00',
|
||||||
|
'darkorchid': '#9932CC',
|
||||||
|
'darkred': '#8B0000',
|
||||||
|
'darksalmon': '#E9967A',
|
||||||
|
'darkslateblue': '#483D8B',
|
||||||
|
'darkslategrey': '#2F4F4F',
|
||||||
|
'darkviolet': '#9400D3',
|
||||||
|
'deeppink': '#FF1493',
|
||||||
|
'dodgerblue': '#1E90FF',
|
||||||
|
'firebrick': '#B22222',
|
||||||
|
'floralwhite': '#FFFAF0',
|
||||||
|
'forestgreen': '#228B22',
|
||||||
|
'fuchsia': '#FF00FF',
|
||||||
|
'gainsboro': '#DCDCDC',
|
||||||
|
'ghostwhite': '#F8F8FF',
|
||||||
|
'gold': '#FFD700',
|
||||||
|
'goldenrod': '#DAA520',
|
||||||
|
'indianred ': '#CD5C5C',
|
||||||
|
'indigo ': '#4B0082',
|
||||||
|
'khaki': '#F0E68C',
|
||||||
|
'lavenderblush': '#FFF0F5',
|
||||||
|
'lawngreen': '#7CFC00',
|
||||||
|
'lightblue': '#ADD8E6',
|
||||||
|
'lightcoral': '#F08080',
|
||||||
|
'lightgoldenrodyellow': '#FAFAD2',
|
||||||
|
'lightgray': '#D3D3D3',
|
||||||
|
'lightgrey': '#D3D3D3',
|
||||||
|
'lightskyblue': '#87CEFA',
|
||||||
|
'lightslategrey': '#778899',
|
||||||
|
'lightsteelblue': '#B0C4DE',
|
||||||
|
'lime': '#87CEFA',
|
||||||
|
'linen': '#FAF0E6',
|
||||||
|
'magenta': '#FF00FF',
|
||||||
|
'maroon': '#800000',
|
||||||
|
'mediumaquamarine': '#66CDAA',
|
||||||
|
'mediumblue': '#0000CD',
|
||||||
|
'mediumorchid': '#BA55D3',
|
||||||
|
'mediumpurple': '#9370D8',
|
||||||
|
'mediumseagreen': '#3CB371',
|
||||||
|
'mediumslateblue': '#7B68EE',
|
||||||
|
'midnightblue': '#191970',
|
||||||
|
'moccasin': '#FFE4B5',
|
||||||
|
'navajowhite': '#FFDEAD',
|
||||||
|
'navy': '#000080',
|
||||||
|
'oldlace': '#FDF5E6',
|
||||||
|
'olive': '#808000',
|
||||||
|
'orange': '#FFA500',
|
||||||
|
'orangered': '#FF4500',
|
||||||
|
'orchid': '#DA70D6',
|
||||||
|
'paleturquoise': '#AFEEEE',
|
||||||
|
'papayawhip': '#FFEFD5',
|
||||||
|
'peachpuff': '#FFDAB9',
|
||||||
|
'powderblue': '#B0E0E6',
|
||||||
|
'rosybrown': '#BC8F8F',
|
||||||
|
'royalblue': '#4169E1',
|
||||||
|
'saddlebrown': '#8B4513',
|
||||||
|
'sandybrown': '#8B4513',
|
||||||
|
'seashell': '#FFF5EE',
|
||||||
|
'sienna': '#A0522D',
|
||||||
|
'silver': '#C0C0C0',
|
||||||
|
'skyblue': '#87CEEB',
|
||||||
|
'slategrey': '#708090',
|
||||||
|
'snow': '#FFFAFA',
|
||||||
|
'springgreen': '#00FF7F',
|
||||||
|
'violet': '#EE82EE',
|
||||||
|
'yellowgreen': '#9ACD32'
|
||||||
|
}
|
||||||
|
|
||||||
|
hex_pat = re.compile(r'#(\d{2})(\d{2})(\d{2})')
|
||||||
|
rgb_pat = re.compile(r'rgb\(\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\)', re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
def lrs_color(html_color):
|
||||||
|
hcol = html_color.lower()
|
||||||
|
match = hex_pat.search(hcol)
|
||||||
|
if match:
|
||||||
|
return '0x00'+match.group(1)+match.group(2)+match.group(3)
|
||||||
|
match = rgb_pat.search(hcol)
|
||||||
|
if match:
|
||||||
|
return '0x00'+hex(int(match.group(1)))[2:]+hex(int(match.group(2)))[2:]+hex(int(match.group(3)))[2:]
|
||||||
|
if hcol in NAME_MAP:
|
||||||
|
return NAME_MAP[hcol].replace('#', '0x00')
|
||||||
|
return '0x00000000'
|
||||||
1951
ebook_converter/ebooks/lrf/html/convert_from.py
Normal file
1951
ebook_converter/ebooks/lrf/html/convert_from.py
Normal file
File diff suppressed because it is too large
Load Diff
386
ebook_converter/ebooks/lrf/html/table.py
Normal file
386
ebook_converter/ebooks/lrf/html/table.py
Normal file
@@ -0,0 +1,386 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
__license__ = 'GPL v3'
|
||||||
|
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
|
||||||
|
import math, sys, re, numbers
|
||||||
|
|
||||||
|
from calibre.ebooks.lrf.fonts import get_font
|
||||||
|
from calibre.ebooks.lrf.pylrs.pylrs import TextBlock, Text, CR, Span, \
|
||||||
|
CharButton, Plot, Paragraph, \
|
||||||
|
LrsTextTag
|
||||||
|
from polyglot.builtins import string_or_bytes, range, native_string_type
|
||||||
|
|
||||||
|
|
||||||
|
def ceil(num):
|
||||||
|
return int(math.ceil(num))
|
||||||
|
|
||||||
|
|
||||||
|
def print_xml(elem):
|
||||||
|
from calibre.ebooks.lrf.pylrs.pylrs import ElementWriter
|
||||||
|
elem = elem.toElement(native_string_type('utf8'))
|
||||||
|
ew = ElementWriter(elem, sourceEncoding=native_string_type('utf8'))
|
||||||
|
ew.write(sys.stdout)
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
def cattrs(base, extra):
|
||||||
|
new = base.copy()
|
||||||
|
new.update(extra)
|
||||||
|
return new
|
||||||
|
|
||||||
|
|
||||||
|
def tokens(tb):
|
||||||
|
'''
|
||||||
|
Return the next token. A token is :
|
||||||
|
1. A string
|
||||||
|
a block of text that has the same style
|
||||||
|
'''
|
||||||
|
def process_element(x, attrs):
|
||||||
|
if isinstance(x, CR):
|
||||||
|
yield 2, None
|
||||||
|
elif isinstance(x, Text):
|
||||||
|
yield x.text, cattrs(attrs, {})
|
||||||
|
elif isinstance(x, string_or_bytes):
|
||||||
|
yield x, cattrs(attrs, {})
|
||||||
|
elif isinstance(x, (CharButton, LrsTextTag)):
|
||||||
|
if x.contents:
|
||||||
|
if hasattr(x.contents[0], 'text'):
|
||||||
|
yield x.contents[0].text, cattrs(attrs, {})
|
||||||
|
elif hasattr(x.contents[0], 'attrs'):
|
||||||
|
for z in process_element(x.contents[0], x.contents[0].attrs):
|
||||||
|
yield z
|
||||||
|
elif isinstance(x, Plot):
|
||||||
|
yield x, None
|
||||||
|
elif isinstance(x, Span):
|
||||||
|
attrs = cattrs(attrs, x.attrs)
|
||||||
|
for y in x.contents:
|
||||||
|
for z in process_element(y, attrs):
|
||||||
|
yield z
|
||||||
|
|
||||||
|
for i in tb.contents:
|
||||||
|
if isinstance(i, CR):
|
||||||
|
yield 1, None
|
||||||
|
elif isinstance(i, Paragraph):
|
||||||
|
for j in i.contents:
|
||||||
|
attrs = {}
|
||||||
|
if hasattr(j, 'attrs'):
|
||||||
|
attrs = j.attrs
|
||||||
|
for k in process_element(j, attrs):
|
||||||
|
yield k
|
||||||
|
|
||||||
|
|
||||||
|
class Cell(object):
|
||||||
|
|
||||||
|
def __init__(self, conv, tag, css):
|
||||||
|
self.conv = conv
|
||||||
|
self.tag = tag
|
||||||
|
self.css = css
|
||||||
|
self.text_blocks = []
|
||||||
|
self.pwidth = -1.
|
||||||
|
if tag.has_attr('width') and '%' in tag['width']:
|
||||||
|
try:
|
||||||
|
self.pwidth = float(tag['width'].replace('%', ''))
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
if 'width' in css and '%' in css['width']:
|
||||||
|
try:
|
||||||
|
self.pwidth = float(css['width'].replace('%', ''))
|
||||||
|
except ValueError:
|
||||||
|
pass
|
||||||
|
if self.pwidth > 100:
|
||||||
|
self.pwidth = -1
|
||||||
|
self.rowspan = self.colspan = 1
|
||||||
|
try:
|
||||||
|
self.colspan = int(tag['colspan']) if tag.has_attr('colspan') else 1
|
||||||
|
self.rowspan = int(tag['rowspan']) if tag.has_attr('rowspan') else 1
|
||||||
|
except:
|
||||||
|
pass
|
||||||
|
|
||||||
|
pp = conv.current_page
|
||||||
|
conv.book.allow_new_page = False
|
||||||
|
conv.current_page = conv.book.create_page()
|
||||||
|
conv.parse_tag(tag, css)
|
||||||
|
conv.end_current_block()
|
||||||
|
for item in conv.current_page.contents:
|
||||||
|
if isinstance(item, TextBlock):
|
||||||
|
self.text_blocks.append(item)
|
||||||
|
conv.current_page = pp
|
||||||
|
conv.book.allow_new_page = True
|
||||||
|
if not self.text_blocks:
|
||||||
|
tb = conv.book.create_text_block()
|
||||||
|
tb.Paragraph(' ')
|
||||||
|
self.text_blocks.append(tb)
|
||||||
|
for tb in self.text_blocks:
|
||||||
|
tb.parent = None
|
||||||
|
tb.objId = 0
|
||||||
|
# Needed as we have to eventually change this BlockStyle's width and
|
||||||
|
# height attributes. This blockstyle may be shared with other
|
||||||
|
# elements, so doing that causes havoc.
|
||||||
|
tb.blockStyle = conv.book.create_block_style()
|
||||||
|
ts = conv.book.create_text_style(**tb.textStyle.attrs)
|
||||||
|
ts.attrs['parindent'] = 0
|
||||||
|
tb.textStyle = ts
|
||||||
|
if ts.attrs['align'] == 'foot':
|
||||||
|
if isinstance(tb.contents[-1], Paragraph):
|
||||||
|
tb.contents[-1].append(' ')
|
||||||
|
|
||||||
|
def pts_to_pixels(self, pts):
|
||||||
|
pts = int(pts)
|
||||||
|
return ceil((float(self.conv.profile.dpi)/72)*(pts/10))
|
||||||
|
|
||||||
|
def minimum_width(self):
|
||||||
|
return max([self.minimum_tb_width(tb) for tb in self.text_blocks])
|
||||||
|
|
||||||
|
def minimum_tb_width(self, tb):
|
||||||
|
ts = tb.textStyle.attrs
|
||||||
|
default_font = get_font(ts['fontfacename'], self.pts_to_pixels(ts['fontsize']))
|
||||||
|
parindent = self.pts_to_pixels(ts['parindent'])
|
||||||
|
mwidth = 0
|
||||||
|
for token, attrs in tokens(tb):
|
||||||
|
font = default_font
|
||||||
|
if isinstance(token, numbers.Integral): # Handle para and line breaks
|
||||||
|
continue
|
||||||
|
if isinstance(token, Plot):
|
||||||
|
return self.pts_to_pixels(token.xsize)
|
||||||
|
ff = attrs.get('fontfacename', ts['fontfacename'])
|
||||||
|
fs = attrs.get('fontsize', ts['fontsize'])
|
||||||
|
if (ff, fs) != (ts['fontfacename'], ts['fontsize']):
|
||||||
|
font = get_font(ff, self.pts_to_pixels(fs))
|
||||||
|
if not token.strip():
|
||||||
|
continue
|
||||||
|
word = token.split()
|
||||||
|
word = word[0] if word else ""
|
||||||
|
width = font.getsize(word)[0]
|
||||||
|
if width > mwidth:
|
||||||
|
mwidth = width
|
||||||
|
return parindent + mwidth + 2
|
||||||
|
|
||||||
|
def text_block_size(self, tb, maxwidth=sys.maxsize, debug=False):
|
||||||
|
ts = tb.textStyle.attrs
|
||||||
|
default_font = get_font(ts['fontfacename'], self.pts_to_pixels(ts['fontsize']))
|
||||||
|
parindent = self.pts_to_pixels(ts['parindent'])
|
||||||
|
top, bottom, left, right = 0, 0, parindent, parindent
|
||||||
|
|
||||||
|
def add_word(width, height, left, right, top, bottom, ls, ws):
|
||||||
|
if left + width > maxwidth:
|
||||||
|
left = width + ws
|
||||||
|
top += ls
|
||||||
|
bottom = top+ls if top+ls > bottom else bottom
|
||||||
|
else:
|
||||||
|
left += (width + ws)
|
||||||
|
right = left if left > right else right
|
||||||
|
bottom = top+ls if top+ls > bottom else bottom
|
||||||
|
return left, right, top, bottom
|
||||||
|
|
||||||
|
for token, attrs in tokens(tb):
|
||||||
|
if attrs is None:
|
||||||
|
attrs = {}
|
||||||
|
font = default_font
|
||||||
|
ls = self.pts_to_pixels(attrs.get('baselineskip', ts['baselineskip']))+\
|
||||||
|
self.pts_to_pixels(attrs.get('linespace', ts['linespace']))
|
||||||
|
ws = self.pts_to_pixels(attrs.get('wordspace', ts['wordspace']))
|
||||||
|
if isinstance(token, numbers.Integral): # Handle para and line breaks
|
||||||
|
if top != bottom: # Previous element not a line break
|
||||||
|
top = bottom
|
||||||
|
else:
|
||||||
|
top += ls
|
||||||
|
bottom += ls
|
||||||
|
left = parindent if int == 1 else 0
|
||||||
|
continue
|
||||||
|
if isinstance(token, Plot):
|
||||||
|
width, height = self.pts_to_pixels(token.xsize), self.pts_to_pixels(token.ysize)
|
||||||
|
left, right, top, bottom = add_word(width, height, left, right, top, bottom, height, ws)
|
||||||
|
continue
|
||||||
|
ff = attrs.get('fontfacename', ts['fontfacename'])
|
||||||
|
fs = attrs.get('fontsize', ts['fontsize'])
|
||||||
|
if (ff, fs) != (ts['fontfacename'], ts['fontsize']):
|
||||||
|
font = get_font(ff, self.pts_to_pixels(fs))
|
||||||
|
for word in token.split():
|
||||||
|
width, height = font.getsize(word)
|
||||||
|
left, right, top, bottom = add_word(width, height, left, right, top, bottom, ls, ws)
|
||||||
|
return right+3+max(parindent, 10), bottom
|
||||||
|
|
||||||
|
def text_block_preferred_width(self, tb, debug=False):
|
||||||
|
return self.text_block_size(tb, sys.maxsize, debug=debug)[0]
|
||||||
|
|
||||||
|
def preferred_width(self, debug=False):
|
||||||
|
return ceil(max([self.text_block_preferred_width(i, debug=debug) for i in self.text_blocks]))
|
||||||
|
|
||||||
|
def height(self, width):
|
||||||
|
return sum([self.text_block_size(i, width)[1] for i in self.text_blocks])
|
||||||
|
|
||||||
|
|
||||||
|
class Row(object):
|
||||||
|
|
||||||
|
def __init__(self, conv, row, css, colpad):
|
||||||
|
self.cells = []
|
||||||
|
self.colpad = colpad
|
||||||
|
cells = row.findAll(re.compile('td|th', re.IGNORECASE))
|
||||||
|
self.targets = []
|
||||||
|
for cell in cells:
|
||||||
|
ccss = conv.tag_css(cell, css)[0]
|
||||||
|
self.cells.append(Cell(conv, cell, ccss))
|
||||||
|
for a in row.findAll(id=True) + row.findAll(name=True):
|
||||||
|
name = a['name'] if a.has_attr('name') else a['id'] if a.has_attr('id') else None
|
||||||
|
if name is not None:
|
||||||
|
self.targets.append(name.replace('#', ''))
|
||||||
|
|
||||||
|
def number_of_cells(self):
|
||||||
|
'''Number of cells in this row. Respects colspan'''
|
||||||
|
ans = 0
|
||||||
|
for cell in self.cells:
|
||||||
|
ans += cell.colspan
|
||||||
|
return ans
|
||||||
|
|
||||||
|
def height(self, widths):
|
||||||
|
i, heights = 0, []
|
||||||
|
for cell in self.cells:
|
||||||
|
width = sum(widths[i:i+cell.colspan])
|
||||||
|
heights.append(cell.height(width))
|
||||||
|
i += cell.colspan
|
||||||
|
if not heights:
|
||||||
|
return 0
|
||||||
|
return max(heights)
|
||||||
|
|
||||||
|
def cell_from_index(self, col):
|
||||||
|
i = -1
|
||||||
|
cell = None
|
||||||
|
for cell in self.cells:
|
||||||
|
for k in range(0, cell.colspan):
|
||||||
|
if i == col:
|
||||||
|
break
|
||||||
|
i += 1
|
||||||
|
if i == col:
|
||||||
|
break
|
||||||
|
return cell
|
||||||
|
|
||||||
|
def minimum_width(self, col):
|
||||||
|
cell = self.cell_from_index(col)
|
||||||
|
if not cell:
|
||||||
|
return 0
|
||||||
|
return cell.minimum_width()
|
||||||
|
|
||||||
|
def preferred_width(self, col):
|
||||||
|
cell = self.cell_from_index(col)
|
||||||
|
if not cell:
|
||||||
|
return 0
|
||||||
|
return 0 if cell.colspan > 1 else cell.preferred_width()
|
||||||
|
|
||||||
|
def width_percent(self, col):
|
||||||
|
cell = self.cell_from_index(col)
|
||||||
|
if not cell:
|
||||||
|
return -1
|
||||||
|
return -1 if cell.colspan > 1 else cell.pwidth
|
||||||
|
|
||||||
|
def cell_iterator(self):
|
||||||
|
for c in self.cells:
|
||||||
|
yield c
|
||||||
|
|
||||||
|
|
||||||
|
class Table(object):
|
||||||
|
|
||||||
|
def __init__(self, conv, table, css, rowpad=10, colpad=10):
|
||||||
|
self.rows = []
|
||||||
|
self.conv = conv
|
||||||
|
self.rowpad = rowpad
|
||||||
|
self.colpad = colpad
|
||||||
|
rows = table.findAll('tr')
|
||||||
|
conv.in_table = True
|
||||||
|
for row in rows:
|
||||||
|
rcss = conv.tag_css(row, css)[0]
|
||||||
|
self.rows.append(Row(conv, row, rcss, colpad))
|
||||||
|
conv.in_table = False
|
||||||
|
|
||||||
|
def number_of_columns(self):
|
||||||
|
max = 0
|
||||||
|
for row in self.rows:
|
||||||
|
max = row.number_of_cells() if row.number_of_cells() > max else max
|
||||||
|
return max
|
||||||
|
|
||||||
|
def number_or_rows(self):
|
||||||
|
return len(self.rows)
|
||||||
|
|
||||||
|
def height(self, maxwidth):
|
||||||
|
''' Return row heights + self.rowpad'''
|
||||||
|
widths = self.get_widths(maxwidth)
|
||||||
|
return sum([row.height(widths) + self.rowpad for row in self.rows]) - self.rowpad
|
||||||
|
|
||||||
|
def minimum_width(self, col):
|
||||||
|
return max([row.minimum_width(col) for row in self.rows])
|
||||||
|
|
||||||
|
def width_percent(self, col):
|
||||||
|
return max([row.width_percent(col) for row in self.rows])
|
||||||
|
|
||||||
|
def get_widths(self, maxwidth):
|
||||||
|
'''
|
||||||
|
Return widths of columns + self.colpad
|
||||||
|
'''
|
||||||
|
rows, cols = self.number_or_rows(), self.number_of_columns()
|
||||||
|
widths = list(range(cols))
|
||||||
|
for c in range(cols):
|
||||||
|
cellwidths = [0 for i in range(rows)]
|
||||||
|
for r in range(rows):
|
||||||
|
try:
|
||||||
|
cellwidths[r] = self.rows[r].preferred_width(c)
|
||||||
|
except IndexError:
|
||||||
|
continue
|
||||||
|
widths[c] = max(cellwidths)
|
||||||
|
|
||||||
|
min_widths = [self.minimum_width(i)+10 for i in range(cols)]
|
||||||
|
for i in range(len(widths)):
|
||||||
|
wp = self.width_percent(i)
|
||||||
|
if wp >= 0:
|
||||||
|
widths[i] = max(min_widths[i], ceil((wp/100) * (maxwidth - (cols-1)*self.colpad)))
|
||||||
|
|
||||||
|
itercount = 0
|
||||||
|
|
||||||
|
while sum(widths) > maxwidth-((len(widths)-1)*self.colpad) and itercount < 100:
|
||||||
|
for i in range(cols):
|
||||||
|
widths[i] = ceil((95/100)*widths[i]) if \
|
||||||
|
ceil((95/100)*widths[i]) >= min_widths[i] else widths[i]
|
||||||
|
itercount += 1
|
||||||
|
|
||||||
|
return [i+self.colpad for i in widths]
|
||||||
|
|
||||||
|
def blocks(self, maxwidth, maxheight):
|
||||||
|
rows, cols = self.number_or_rows(), self.number_of_columns()
|
||||||
|
cellmatrix = [[None for c in range(cols)] for r in range(rows)]
|
||||||
|
rowpos = [0 for i in range(rows)]
|
||||||
|
for r in range(rows):
|
||||||
|
nc = self.rows[r].cell_iterator()
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
cell = next(nc)
|
||||||
|
cellmatrix[r][rowpos[r]] = cell
|
||||||
|
rowpos[r] += cell.colspan
|
||||||
|
for k in range(1, cell.rowspan):
|
||||||
|
try:
|
||||||
|
rowpos[r+k] += 1
|
||||||
|
except IndexError:
|
||||||
|
break
|
||||||
|
except StopIteration: # No more cells in this row
|
||||||
|
continue
|
||||||
|
|
||||||
|
widths = self.get_widths(maxwidth)
|
||||||
|
heights = [row.height(widths) for row in self.rows]
|
||||||
|
|
||||||
|
xpos = [sum(widths[:i]) for i in range(cols)]
|
||||||
|
delta = maxwidth - sum(widths)
|
||||||
|
if delta < 0:
|
||||||
|
delta = 0
|
||||||
|
for r in range(len(cellmatrix)):
|
||||||
|
yield None, 0, heights[r], 0, self.rows[r].targets
|
||||||
|
for c in range(len(cellmatrix[r])):
|
||||||
|
cell = cellmatrix[r][c]
|
||||||
|
if not cell:
|
||||||
|
continue
|
||||||
|
width = sum(widths[c:c+cell.colspan])-self.colpad*cell.colspan
|
||||||
|
sypos = 0
|
||||||
|
for tb in cell.text_blocks:
|
||||||
|
tb.blockStyle = self.conv.book.create_block_style(
|
||||||
|
blockwidth=width,
|
||||||
|
blockheight=cell.text_block_size(tb, width)[1],
|
||||||
|
blockrule='horz-fixed')
|
||||||
|
|
||||||
|
yield tb, xpos[c], sypos, delta, None
|
||||||
|
sypos += tb.blockStyle.attrs['blockheight']
|
||||||
7
ebook_converter/ebooks/lrf/pylrs/__init__.py
Normal file
7
ebook_converter/ebooks/lrf/pylrs/__init__.py
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
"""
|
||||||
|
This package contains code to generate ebooks in the SONY LRS/F format. It was
|
||||||
|
originally developed by Mike Higgins and has been extended and modified by Kovid
|
||||||
|
Goyal.
|
||||||
|
"""
|
||||||
78
ebook_converter/ebooks/lrf/pylrs/elements.py
Normal file
78
ebook_converter/ebooks/lrf/pylrs/elements.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
from __future__ import absolute_import, division, print_function, unicode_literals
|
||||||
|
|
||||||
|
""" elements.py -- replacements and helpers for ElementTree """
|
||||||
|
|
||||||
|
from polyglot.builtins import unicode_type, string_or_bytes
|
||||||
|
|
||||||
|
|
||||||
|
class ElementWriter(object):
|
||||||
|
|
||||||
|
def __init__(self, e, header=False, sourceEncoding="ascii",
|
||||||
|
spaceBeforeClose=True, outputEncodingName="UTF-16"):
|
||||||
|
self.header = header
|
||||||
|
self.e = e
|
||||||
|
self.sourceEncoding=sourceEncoding
|
||||||
|
self.spaceBeforeClose = spaceBeforeClose
|
||||||
|
self.outputEncodingName = outputEncodingName
|
||||||
|
|
||||||
|
def _encodeCdata(self, rawText):
|
||||||
|
if isinstance(rawText, bytes):
|
||||||
|
rawText = rawText.decode(self.sourceEncoding)
|
||||||
|
|
||||||
|
text = rawText.replace("&", "&")
|
||||||
|
text = text.replace("<", "<")
|
||||||
|
text = text.replace(">", ">")
|
||||||
|
return text
|
||||||
|
|
||||||
|
def _writeAttribute(self, f, name, value):
|
||||||
|
f.write(' %s="' % unicode_type(name))
|
||||||
|
if not isinstance(value, string_or_bytes):
|
||||||
|
value = unicode_type(value)
|
||||||
|
value = self._encodeCdata(value)
|
||||||
|
value = value.replace('"', '"')
|
||||||
|
f.write(value)
|
||||||
|
f.write('"')
|
||||||
|
|
||||||
|
def _writeText(self, f, rawText):
|
||||||
|
text = self._encodeCdata(rawText)
|
||||||
|
f.write(text)
|
||||||
|
|
||||||
|
def _write(self, f, e):
|
||||||
|
f.write('<' + unicode_type(e.tag))
|
||||||
|
|
||||||
|
attributes = e.items()
|
||||||
|
attributes.sort()
|
||||||
|
for name, value in attributes:
|
||||||
|
self._writeAttribute(f, name, value)
|
||||||
|
|
||||||
|
if e.text is not None or len(e) > 0:
|
||||||
|
f.write('>')
|
||||||
|
|
||||||
|
if e.text:
|
||||||
|
self._writeText(f, e.text)
|
||||||
|
|
||||||
|
for e2 in e:
|
||||||
|
self._write(f, e2)
|
||||||
|
|
||||||
|
f.write('</%s>' % e.tag)
|
||||||
|
else:
|
||||||
|
if self.spaceBeforeClose:
|
||||||
|
f.write(' ')
|
||||||
|
f.write('/>')
|
||||||
|
|
||||||
|
if e.tail is not None:
|
||||||
|
self._writeText(f, e.tail)
|
||||||
|
|
||||||
|
def toString(self):
|
||||||
|
class x:
|
||||||
|
pass
|
||||||
|
buffer = []
|
||||||
|
x.write = buffer.append
|
||||||
|
self.write(x)
|
||||||
|
return ''.join(buffer)
|
||||||
|
|
||||||
|
def write(self, f):
|
||||||
|
if self.header:
|
||||||
|
f.write('<?xml version="1.0" encoding="%s"?>\n' % self.outputEncodingName)
|
||||||
|
|
||||||
|
self._write(f, self.e)
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user