# Data Reconciliation API

A Flask-based service for reconciling financial transaction data from Excel and PDF sources. The application includes background processing for exporting processed spreadsheets, audit-friendly logging, and production-ready security hardening.

## Features

- Import data from uploaded files or remote whitelisted URLs.
- Highlight matches across multiple Excel workbooks and PDF documents.
- Export results asynchronously without blocking API responses.
- Centralised logging with rotating file handlers.
- Persistent background task and download cache state shared across workers.
- Built-in API key authentication, request rate limiting, and restrictive CORS configuration.

## Getting Started

### Prerequisites

- Python 3.11+
- Optional: Docker (for containerised deployment)

### Installation

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

### Environment Variables

| Variable | Description | Default |
| --- | --- | --- |
| `FLASK_ENV` | Application environment (`development`, `production`, `testing`) | `development` |
| `SECRET_KEY` | Secret key for Flask sessions | Randomly generated at startup (set explicitly in production) |
| `ALLOWED_ORIGINS` | Comma separated list of allowed CORS origins | `http://localhost:3000` |
| `DEFAULT_RATE_LIMITS` | Global rate limits applied by Flask-Limiter | `200 per day;50 per hour` |
| `CHECK_RATE_LIMIT` | Specific rate limit for `/check` endpoint | `10 per minute` |
| `API_KEYS` | Comma-separated list of valid API keys | _required_ |
| `ALLOWED_DOWNLOAD_HOSTS` | Comma separated list of hostnames allowed for remote downloads | _required for remote downloads_ |
| `UPLOAD_FOLDER` | Directory used for temporary uploads | `uploads` |
| `BACKGROUND_WORKERS` | Thread count for background executor | `4` |
| `BACKGROUND_TASK_RETENTION_SECONDS` | Retention window for background task metadata | `3600` |
| `STATE_DB_PATH` | Path for the SQLite state store | `runtime_state.sqlite3` |

Create a `.env` file with your settings when running locally.

### Running the Application

```bash
export FLASK_ENV=development
export API_KEYS=dev-secret-key
export ALLOWED_ORIGINS=http://localhost:3000
export ALLOWED_DOWNLOAD_HOSTS=example.com
python run.py
```

The API is available at `http://localhost:5000` by default.

### Running Tests

```bash
pytest
```

## API Overview

See [docs/API.md](docs/API.md) for detailed request and response examples.

Key points:

- Include the header `X-API-Key` with a valid key on every request.
- Observe published rate limits; exceeding them returns `429 Too Many Requests`.
- Remote file downloads are only permitted from hosts listed in `ALLOWED_DOWNLOAD_HOSTS`.

## Docker

Build and run the production image:

```bash
docker build -t data-reconciliation .
docker run --env-file .env -p 8000:8000 data-reconciliation
```

## Security Notes

- API key authentication and rate limiting are mandatory in production deployments.
- CORS defaults to a single trusted origin; configure explicitly for additional domains.
- Background task metadata and download caches are stored in SQLite to avoid multi-worker inconsistencies.
- Remote downloads and file uploads are validated before processing to mitigate SSRF and path traversal attacks.

## License

This project is provided as-is without a specific license. Contact the maintainers for usage questions.
