AI and Data Processing Disclosure

Last Updated: 2026-06-03

This AI and Data Processing Disclosure ("Disclosure") explains how the MainBook service ("Service") provided by Human Beyond LLC ("MainBook", "we", "us", "our") uses artificial intelligence ("AI") and machine-learning ("ML") technology to process documents you upload, and the specific commitments and limitations that apply.

This Disclosure is incorporated by reference into our Terms of Service, Privacy Policy, and Data Processing Agreement. It supplements but does not replace those documents. Capitalized terms not defined here have the meanings given in the Terms of Service.

1. Summary

MainBook uses optical character recognition ("OCR") and large-language-model ("LLM") technology, provided by third-party AI vendors, to convert your bank or credit card statement PDFs into structured data formats.
The current AI sub-processors are listed in our Sub-Processors page, which is updated as our sub-processor mix changes.
We do not use your Content, your Output, or extracted financial data to train or improve our own AI models.
We engage AI sub-processors that, according to their published terms applicable to commercial or API customers, are configured for no-training defaults or are contractually committed not to use Customer data to train their models, in each case where such configuration or commitment is available.
AI is probabilistic. Output may contain errors, omissions, or hallucinated data. You must independently verify Output against the source documents before relying on it. See our Disclaimer for the full warning.

2. How the Service Uses AI

When you upload a document, the Service performs the following steps:

2.1 Receipt and storage. Your uploaded document is received over an encrypted TLS connection and stored in object storage (DigitalOcean Spaces). The document file is encrypted at rest.

2.2 Optical Character Recognition (OCR). The document is transmitted to our OCR sub-processor (Mistral AI, as of the Last Updated date above), which converts the visual content of the PDF into text and layout-aware structured representation.

2.3 Structured extraction (LLM). The OCR output is transmitted to our large-language-model ("LLM") sub-processor (Google, via the Gemini API, as of the Last Updated date above). The LLM identifies transaction rows, extracts dates, descriptions, amounts, balances, and other relevant fields, and returns structured data. The specific LLM model used is selected based on runtime configuration and may change over time without notice (we always reserve the right to choose the model best suited to a given document). We access the LLM provider on a paid (billed) basis.

2.4 Validation and quality control. The Service applies internal mathematical and logical checks (for example, comparing the sum of credits and debits against opening and closing balances) and may re-run extraction on documents that fail these checks.

2.5 Storage of Output. The structured Output is stored in our application database and made available to you for review, editing, and export.

2.6 Auto-deletion. Both the source document and the Output are auto-deleted ninety (90) days after upload, in accordance with our Privacy Policy.

3. AI Sub-Processors

The current list of AI sub-processors is set out in our Sub-Processors page, which is the single source of truth for vendor identity and is updated as our vendor mix changes. As of the Last Updated date above, the primary AI sub-processors are:

Mistral AI (Paris, France). OCR provider. Processes the visual content of your uploaded document and returns a text and layout representation. Subject to Mistral's then-current published terms applicable to its API customers.

Google, via the Gemini API (United States). LLM provider. Receives the OCR output and returns the structured extraction (transaction rows, dates, descriptions, amounts, balances). We access the Gemini API directly, on a paid (billed) basis, and are therefore subject to the terms Google applies to paid Gemini API access. We do not transmit your raw uploaded document file directly to Google; Google receives the OCR output produced by Mistral AI.

If the AI sub-processors listed here change (for example, if we replace Mistral with a different OCR provider, or switch to a different LLM provider), this Disclosure will be updated and the change will be reflected in the Sub-Processors page with at least thirty (30) days' advance notice. Material changes (for example, addition of a new AI sub-processor or a change in the category of underlying providers) trigger our standard sub-processor notice mechanism described in Section 5 of our Data Processing Agreement.

4. Our "No Training" Position

4.1 We do not train our own models. MainBook does not maintain or train its own AI or ML models. We do not use your Content, your Output, or any data derived from your use of the Service to train, fine-tune, or improve any AI or ML model owned, developed, or controlled by us or by our affiliates.

4.2 Our AI sub-processors do not train on your data. We use our AI sub-processors on paid (billed) tiers that, according to their published terms applicable to API customers, do not use customer data to train their models:

Mistral AI (OCR). Mistral's published terms state that, for its API customers, "We do not use your data to train our models." Data sent through the Mistral API is not used for model training by default.
Google (Gemini API, LLM). Google's terms for paid Gemini API access state: "Google doesn't use your prompts (including associated system instructions, cached content, and files such as images, videos, or documents) or responses to improve our products." We access the Gemini API on a paid basis and so this no-training commitment applies to our use.

These commitments reflect the vendors' then-current published terms; we link to the authoritative pages from our Sub-Processors list and through the vendors' own legal pages. Where a sub-processor offers an opt-out from, or a default exclusion from, model training, that protection is in effect for our use of the Service.

4.3 Limits of our position. These no-training commitments are the vendors' contractual and published representations, on which we rely; we do not, and cannot, independently audit the internal data-handling practices of each vendor, and a vendor may change its published terms over time. Separately, a vendor may retain inputs and outputs for a limited period for its own abuse-monitoring, security, and legal-compliance purposes (for example, Mistral retains API inputs and outputs for a limited rolling window for abuse monitoring before deletion); such limited retention is distinct from, and does not constitute, training. If a vendor materially changes its data-use terms, or if we change AI sub-processors, we will update this Disclosure as described in Sections 3 and 9. If you require a stricter or independently contracted guarantee — for example, a zero-data-retention configuration or a direct contractual no-training commitment from a specific provider — please contact us at hello@human-beyond.ai and we will discuss whether your requirements can be accommodated under your specific configuration.

4.4 Anonymized telemetry. We collect anonymized telemetry about the operation of the Service (for example, processing duration, error rates, confidence scores, file size distributions). This telemetry does not contain your Content or your Output and is used to operate, secure, and improve the Service. It is not used to train AI models.

5. Probabilistic Nature of AI — Risk of Errors

5.1 No guarantee of accuracy. Output from the Service is generated by probabilistic AI models. Output may not be accurate, complete, reliable, or fit for any particular purpose. Errors that AI-based extraction may produce include but are not limited to:

(a) misreading numerical values (mistaking "1" for "7", transposing digits, dropping or duplicating digits);

(b) misreading or inverting dates;

(d) omitting transactions from the Output;

(e) duplicating transactions in the Output;

(f) misattributing transactions across accounts;

(g) producing fabricated or "hallucinated" entries that do not appear in the source document;

(h) misinterpreting the layout, structure, or context of the source document;

(i) producing different Output for the same input on repeated runs.

5.2 Your obligation to verify. You must independently verify the Output against the original source documents before using, sharing, or relying on it. The Service is intended to assist you; it is not a replacement for human review. See Section 8 of the Terms of Service and Section 3 of the Disclaimer for the full obligation.

5.3 High-stakes decisions. You agree not to use the Output, without independent human review by a licensed or qualified professional, for any decision in the following sensitive areas: financial activities and credit; insurance; legal; medical; employment, housing, or education; essential government services; product safety; national security; migration; or law enforcement. See Section 8.3 of the Terms of Service and Section 4 of the Disclaimer.

6. Data Flow and Data Residency

6.1 Data flow. When you upload a document, Customer Personal Data (as defined in the DPA) flows from your browser, to our application, to our OCR sub-processor (and back), to our LLM sub-processor (Google, via the Gemini API, and back), and is stored in our object storage and database.

6.2 Cross-border processing. Some sub-processors are located in countries other than the United States (for example, our OCR sub-processor Mistral AI is based in France). Some sub-processors may use sub-processors of their own in additional locations. The locations applicable as of the Last Updated date above are listed in our Sub-Processors page.

6.3 Transfer mechanisms. Cross-border transfers of personal data from the EEA, the UK, or Switzerland to other jurisdictions are made on the basis of the transfer mechanisms set out in Section 6 of our DPA (Standard Contractual Clauses, UK Addendum, and equivalent mechanisms).

7. Output Ownership and License

7.1 You own the Output. As between you and us, you own the Output generated from your Content, as set out in Section 9 of the Terms of Service.

7.2 Use of Output. Output may not be used in any manner that violates the Acceptable Use Policy, including but not limited to use for high-stakes automated decisions without human review, or use to train, develop, or improve any competing AI product, model, or service.

7.3 Similarity of Output. Due to the nature of AI, Output may not be unique. Other users may receive similar Output from similar inputs. Our assignment of Output rights does not extend to other users' Output.

8. Human Review

We may, on a limited basis and only as reasonably necessary to operate, secure, and improve the Service, have human personnel review specific items of Output or specific Customer documents — for example, to investigate a reported error, to triage a security or abuse incident, or to debug a processing failure. Such review is performed by personnel bound by confidentiality obligations, is logged, and is conducted under the principle of least privilege. We do not engage in routine human review of Customer documents or Output as part of the conversion pipeline; the conversion pipeline is automated.

9. Updates to This Disclosure

This Disclosure may be updated to reflect changes in our AI sub-processors, in their published terms, or in our own practices. Material changes (for example, addition of a new AI sub-processor or a change in the category of underlying LLM providers) are subject to the thirty (30) day notice mechanism described in Section 5 of our DPA. Non-material changes (for example, formatting, clarifications) take effect upon posting.

10. Contact

For questions about AI processing in the Service, or to request additional information about a specific sub-processor or underlying provider:

Human Beyond LLC Email: hello@human-beyond.ai