diff --git a/README.md b/README.md new file mode 100644 index 0000000..29db357 --- /dev/null +++ b/README.md @@ -0,0 +1,34 @@ +This repository contains tools for NGOs organizing private hosting. + +## Import of existing datasets + +- [x] https://mission-lifeline.de/unterkunft-bereitstellen/ +- [] https://warhelp.eu/ +- [x] https://www.dhdd.info/ +- [x] https://icanhelp.host/ (public API) + +## Transformation / Merging / Export + +- [x] edn +- [x] json +- [x] csv +- [x] xlsx + +## A secure / robust / scalable **backend** usable by all NGOs + +- [] nixos server deployment +- [] reproducible builds with nixpkgs +- [] xtdb + +Will be inspired by [swlkup](https://github.com/johannesloetzsch/swlkup) + +## A frontend for NGO members authorized to **search** within the database + +- [] TODO + +## A customizable public form to submit new offers + +There is no need to use this component for NGOs happy with their existing solution. + +- [] zentralwerk & goethe institut +- [] lifeline diff --git a/import/api/wpforms-crawler/.gitignore b/import/api/wpforms-crawler/.gitignore index 7997ddc..fc253c5 100644 --- a/import/api/wpforms-crawler/.gitignore +++ b/import/api/wpforms-crawler/.gitignore @@ -1,2 +1,3 @@ config.sh data* +*~ diff --git a/import/api/wpforms-crawler/README.md b/import/api/wpforms-crawler/README.md index af7d405..694d1c6 100644 --- a/import/api/wpforms-crawler/README.md +++ b/import/api/wpforms-crawler/README.md @@ -1,3 +1,3 @@ -[wpforms](https://wpforms.com/) uses an counter for `ENTRY_ID`s and seems to be vulnerable against CSRF :( +[wpforms](https://wpforms.com/) uses a counter for `ENTRY_ID`s and seems to be vulnerable against CSRF :( Once we have obtained a cookie, crawling is trivial… diff --git a/import/api/wpforms-mails/.gitignore b/import/api/wpforms-mails/.gitignore index 3e64a0c..08c6cc9 100644 --- a/import/api/wpforms-mails/.gitignore +++ b/import/api/wpforms-mails/.gitignore @@ -1,3 +1,9 @@ +*.xlsx +*.csv + +.clj-kondo/ +.lsp/ + /target /classes /checkouts @@ -9,5 +15,3 @@ pom.xml.asc /.lein-* /.nrepl-port /.prepl-port -*.xlsx -*.csv diff --git a/import/api/wpforms-mails/LICENSE b/import/api/wpforms-mails/LICENSE new file mode 100644 index 0000000..2315126 --- /dev/null +++ b/import/api/wpforms-mails/LICENSE @@ -0,0 +1,280 @@ +Eclipse Public License - v 2.0 + + THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE + PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION + OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT. + +1. DEFINITIONS + +"Contribution" means: + + a) in the case of the initial Contributor, the initial content + Distributed under this Agreement, and + + b) in the case of each subsequent Contributor: + i) changes to the Program, and + ii) additions to the Program; + where such changes and/or additions to the Program originate from + and are Distributed by that particular Contributor. A Contribution + "originates" from a Contributor if it was added to the Program by + such Contributor itself or anyone acting on such Contributor's behalf. + Contributions do not include changes or additions to the Program that + are not Modified Works. + +"Contributor" means any person or entity that Distributes the Program. + +"Licensed Patents" mean patent claims licensable by a Contributor which +are necessarily infringed by the use or sale of its Contribution alone +or when combined with the Program. + +"Program" means the Contributions Distributed in accordance with this +Agreement. + +"Recipient" means anyone who receives the Program under this Agreement +or any Secondary License (as applicable), including Contributors. + +"Derivative Works" shall mean any work, whether in Source Code or other +form, that is based on (or derived from) the Program and for which the +editorial revisions, annotations, elaborations, or other modifications +represent, as a whole, an original work of authorship. + +"Modified Works" shall mean any work in Source Code or other form that +results from an addition to, deletion from, or modification of the +contents of the Program, including, for purposes of clarity any new file +in Source Code form that contains any contents of the Program. Modified +Works shall not include works that contain only declarations, +interfaces, types, classes, structures, or files of the Program solely +in each case in order to link to, bind by name, or subclass the Program +or Modified Works thereof. + +"Distribute" means the acts of a) distributing or b) making available +in any manner that enables the transfer of a copy. + +"Source Code" means the form of a Program preferred for making +modifications, including but not limited to software source code, +documentation source, and configuration files. + +"Secondary License" means either the GNU General Public License, +Version 2.0, or any later versions of that license, including any +exceptions or additional permissions as identified by the initial +Contributor. + +2. GRANT OF RIGHTS + + a) Subject to the terms of this Agreement, each Contributor hereby + grants Recipient a non-exclusive, worldwide, royalty-free copyright + license to reproduce, prepare Derivative Works of, publicly display, + publicly perform, Distribute and sublicense the Contribution of such + Contributor, if any, and such Derivative Works. + + b) Subject to the terms of this Agreement, each Contributor hereby + grants Recipient a non-exclusive, worldwide, royalty-free patent + license under Licensed Patents to make, use, sell, offer to sell, + import and otherwise transfer the Contribution of such Contributor, + if any, in Source Code or other form. This patent license shall + apply to the combination of the Contribution and the Program if, at + the time the Contribution is added by the Contributor, such addition + of the Contribution causes such combination to be covered by the + Licensed Patents. The patent license shall not apply to any other + combinations which include the Contribution. No hardware per se is + licensed hereunder. + + c) Recipient understands that although each Contributor grants the + licenses to its Contributions set forth herein, no assurances are + provided by any Contributor that the Program does not infringe the + patent or other intellectual property rights of any other entity. + Each Contributor disclaims any liability to Recipient for claims + brought by any other entity based on infringement of intellectual + property rights or otherwise. As a condition to exercising the + rights and licenses granted hereunder, each Recipient hereby + assumes sole responsibility to secure any other intellectual + property rights needed, if any. For example, if a third party + patent license is required to allow Recipient to Distribute the + Program, it is Recipient's responsibility to acquire that license + before distributing the Program. + + d) Each Contributor represents that to its knowledge it has + sufficient copyright rights in its Contribution, if any, to grant + the copyright license set forth in this Agreement. + + e) Notwithstanding the terms of any Secondary License, no + Contributor makes additional grants to any Recipient (other than + those set forth in this Agreement) as a result of such Recipient's + receipt of the Program under the terms of a Secondary License + (if permitted under the terms of Section 3). + +3. REQUIREMENTS + +3.1 If a Contributor Distributes the Program in any form, then: + + a) the Program must also be made available as Source Code, in + accordance with section 3.2, and the Contributor must accompany + the Program with a statement that the Source Code for the Program + is available under this Agreement, and informs Recipients how to + obtain it in a reasonable manner on or through a medium customarily + used for software exchange; and + + b) the Contributor may Distribute the Program under a license + different than this Agreement, provided that such license: + i) effectively disclaims on behalf of all other Contributors all + warranties and conditions, express and implied, including + warranties or conditions of title and non-infringement, and + implied warranties or conditions of merchantability and fitness + for a particular purpose; + + ii) effectively excludes on behalf of all other Contributors all + liability for damages, including direct, indirect, special, + incidental and consequential damages, such as lost profits; + + iii) does not attempt to limit or alter the recipients' rights + in the Source Code under section 3.2; and + + iv) requires any subsequent distribution of the Program by any + party to be under a license that satisfies the requirements + of this section 3. + +3.2 When the Program is Distributed as Source Code: + + a) it must be made available under this Agreement, or if the + Program (i) is combined with other material in a separate file or + files made available under a Secondary License, and (ii) the initial + Contributor attached to the Source Code the notice described in + Exhibit A of this Agreement, then the Program may be made available + under the terms of such Secondary Licenses, and + + b) a copy of this Agreement must be included with each copy of + the Program. + +3.3 Contributors may not remove or alter any copyright, patent, +trademark, attribution notices, disclaimers of warranty, or limitations +of liability ("notices") contained within the Program from any copy of +the Program which they Distribute, provided that Contributors may add +their own appropriate notices. + +4. COMMERCIAL DISTRIBUTION + +Commercial distributors of software may accept certain responsibilities +with respect to end users, business partners and the like. While this +license is intended to facilitate the commercial use of the Program, +the Contributor who includes the Program in a commercial product +offering should do so in a manner which does not create potential +liability for other Contributors. Therefore, if a Contributor includes +the Program in a commercial product offering, such Contributor +("Commercial Contributor") hereby agrees to defend and indemnify every +other Contributor ("Indemnified Contributor") against any losses, +damages and costs (collectively "Losses") arising from claims, lawsuits +and other legal actions brought by a third party against the Indemnified +Contributor to the extent caused by the acts or omissions of such +Commercial Contributor in connection with its distribution of the Program +in a commercial product offering. The obligations in this section do not +apply to any claims or Losses relating to any actual or alleged +intellectual property infringement. In order to qualify, an Indemnified +Contributor must: a) promptly notify the Commercial Contributor in +writing of such claim, and b) allow the Commercial Contributor to control, +and cooperate with the Commercial Contributor in, the defense and any +related settlement negotiations. The Indemnified Contributor may +participate in any such claim at its own expense. + +For example, a Contributor might include the Program in a commercial +product offering, Product X. That Contributor is then a Commercial +Contributor. If that Commercial Contributor then makes performance +claims, or offers warranties related to Product X, those performance +claims and warranties are such Commercial Contributor's responsibility +alone. Under this section, the Commercial Contributor would have to +defend claims against the other Contributors related to those performance +claims and warranties, and if a court requires any other Contributor to +pay any damages as a result, the Commercial Contributor must pay +those damages. + +5. NO WARRANTY + +EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, AND TO THE EXTENT +PERMITTED BY APPLICABLE LAW, THE PROGRAM IS PROVIDED ON AN "AS IS" +BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR +IMPLIED INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF +TITLE, NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR +PURPOSE. Each Recipient is solely responsible for determining the +appropriateness of using and distributing the Program and assumes all +risks associated with its exercise of rights under this Agreement, +including but not limited to the risks and costs of program errors, +compliance with applicable laws, damage to or loss of data, programs +or equipment, and unavailability or interruption of operations. + +6. DISCLAIMER OF LIABILITY + +EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, AND TO THE EXTENT +PERMITTED BY APPLICABLE LAW, NEITHER RECIPIENT NOR ANY CONTRIBUTORS +SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST +PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM OR THE +EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + +7. GENERAL + +If any provision of this Agreement is invalid or unenforceable under +applicable law, it shall not affect the validity or enforceability of +the remainder of the terms of this Agreement, and without further +action by the parties hereto, such provision shall be reformed to the +minimum extent necessary to make such provision valid and enforceable. + +If Recipient institutes patent litigation against any entity +(including a cross-claim or counterclaim in a lawsuit) alleging that the +Program itself (excluding combinations of the Program with other software +or hardware) infringes such Recipient's patent(s), then such Recipient's +rights granted under Section 2(b) shall terminate as of the date such +litigation is filed. + +All Recipient's rights under this Agreement shall terminate if it +fails to comply with any of the material terms or conditions of this +Agreement and does not cure such failure in a reasonable period of +time after becoming aware of such noncompliance. If all Recipient's +rights under this Agreement terminate, Recipient agrees to cease use +and distribution of the Program as soon as reasonably practicable. +However, Recipient's obligations under this Agreement and any licenses +granted by Recipient relating to the Program shall continue and survive. + +Everyone is permitted to copy and distribute copies of this Agreement, +but in order to avoid inconsistency the Agreement is copyrighted and +may only be modified in the following manner. The Agreement Steward +reserves the right to publish new versions (including revisions) of +this Agreement from time to time. No one other than the Agreement +Steward has the right to modify this Agreement. The Eclipse Foundation +is the initial Agreement Steward. The Eclipse Foundation may assign the +responsibility to serve as the Agreement Steward to a suitable separate +entity. Each new version of the Agreement will be given a distinguishing +version number. The Program (including Contributions) may always be +Distributed subject to the version of the Agreement under which it was +received. In addition, after a new version of the Agreement is published, +Contributor may elect to Distribute the Program (including its +Contributions) under the new version. + +Except as expressly stated in Sections 2(a) and 2(b) above, Recipient +receives no rights or licenses to the intellectual property of any +Contributor under this Agreement, whether expressly, by implication, +estoppel or otherwise. All rights in the Program not expressly granted +under this Agreement are reserved. Nothing in this Agreement is intended +to be enforceable by any entity that is not a Contributor or Recipient. +No third-party beneficiary rights are created under this Agreement. + +Exhibit A - Form of Secondary Licenses Notice + +"This Source Code may also be made available under the following +Secondary Licenses when the conditions for such availability set forth +in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public +License as published by the Free Software Foundation, either version 2 +of the License, or (at your option) any later version, with the GNU +Classpath Exception which is available at +https://www.gnu.org/software/classpath/license.html." + + Simply including a copy of this Agreement, including this Exhibit A + is not sufficient to license the Source Code under Secondary Licenses. + + If it is not possible or desirable to put the notice in a particular + file, then You may include the notice in a location (such as a LICENSE + file in a relevant directory) where a recipient would be likely to + look for such a notice. + + You may add additional accurate notices of copyright ownership. diff --git a/import/api/wpforms-mails/README.md b/import/api/wpforms-mails/README.md new file mode 100644 index 0000000..34f8534 --- /dev/null +++ b/import/api/wpforms-mails/README.md @@ -0,0 +1,11 @@ +Assume someone used [wpforms](https://wpforms.com/) on a WordPress As A Service without having database access and than `wpforms` is buggy and the download feature doesn't work. Also there is no access to the database and the only way of getting your data is extracting the data from HTML-mails wpforms was sending… + +This repo provides a quick hack to load mails via `imap` or a `local mbox`, extract wpforms datasets by parsing HTML-mails and storing the data as `edn`/`json`/`csv`/`xlsx` + +## Usage + +Provide the config via .lein-env or environment-variables: + +```sh +WPFORMS_MAILS_FILE="~/.thunderbird/myprofileid.default/Mail/Local Folders/example_folder" lein run /tmp/output.csv +``` diff --git a/import/api/wpforms-mails/src/data/edn/writer.clj b/import/api/wpforms-mails/src/data/edn/writer.clj new file mode 100644 index 0000000..f92aa20 --- /dev/null +++ b/import/api/wpforms-mails/src/data/edn/writer.clj @@ -0,0 +1,9 @@ +(ns data.edn.writer + (:require [clojure.pprint :refer [pprint]])) + +(defn edn->pprint [edn] + (with-out-str (pprint edn))) + +(defn write-edn [file edn] + (->> (edn->pprint edn) + (spit file))) diff --git a/import/api/wpforms-mails/src/data/table/writer.clj b/import/api/wpforms-mails/src/data/table/writer.clj index ab02a98..b5720b9 100644 --- a/import/api/wpforms-mails/src/data/table/writer.clj +++ b/import/api/wpforms-mails/src/data/table/writer.clj @@ -1,6 +1,7 @@ (ns data.table.writer (:require [dk.ative.docjure.spreadsheet :as xls] [semantic-csv.core :as sc] + [data.edn.writer :refer [write-edn]] [clojure.spec.alpha :as s] [clojure.string :refer [ends-with?]])) @@ -8,7 +9,7 @@ (s/def ::table-map (s/coll-of map?)) (s/def ::table-vec (s/coll-of vector?)) (s/def ::table (s/or :map ::table-map - :vec ::table-vec)) + :vec ::table-vec)) (defn vectorize-if-needed [table] (assert (s/valid? ::table table)) @@ -25,10 +26,14 @@ ([filename table] (save-table! filename {} table)) ([filename args table] - (assert (or (ends-with? filename ".xlsx") - (ends-with? filename ".csv"))) - (if (ends-with? filename ".xlsx") + (cond + (ends-with? filename ".xlsx") (table2xls filename args table) + (ends-with? filename ".csv") (let [args+defaults (merge {:writer-opts {:delimiter ";"}} args)] - (sc/spit-csv filename args+defaults table))) + (sc/spit-csv filename args+defaults table)) + (ends-with? filename ".edn") + (write-edn filename table) + :else + (assert false "Unsupported file extension!")) (println "Saved " filename))) diff --git a/import/api/wpforms-mails/src/wpforms_mails/core.clj b/import/api/wpforms-mails/src/wpforms_mails/core.clj index dfd9139..1622560 100644 --- a/import/api/wpforms-mails/src/wpforms_mails/core.clj +++ b/import/api/wpforms-mails/src/wpforms_mails/core.clj @@ -47,7 +47,8 @@ (-> message message->html wpforms_html->edn))) - rest ;; TODO filter valid entries + rest ;; TODO filter valid entries, in my example file all except the first mail are from wpforms + ;; We should generate and include an example mbox without sensible data for testing… (save-table! filename {:workbook-name "Host Offers"})))) (comment