Trumpxiety
Methodology
How the Anxiety Scoring System Works
A Complete Technical Explainer — From Data Ingestion to Final Score
1. What This System Does (The Big Picture)
Trumpxiety is a political anxiety barometer. Every six hours, it automatically harvests political news articles from multiple sources, sends them to an AI language model for sentiment analysis, computes per-category anxiety scores, and exposes those scores through a REST API consumed by a frontend dashboard.
The core question the system answers is: based on what the news is saying right now, how much anxiety should a politically-aware person feel across nine distinct areas of political life?
The output is a number between 0 and 100 for each of nine categories, plus a confidence-weighted aggregate ("top-level") score. These are not counts of bad news stories. They are sentiment valence measurements — the emotional tone of coverage, not its volume.
2. System Architecture at a Glance
The system is organized as a Backend-for-Frontend (BFF) service. The frontend never touches the database directly. Every data request goes through this service's HTTP API.
The architecture has three distinct runtime contexts:
The Scheduled Pipeline runs every six hours, triggered by an external scheduling system (not by the HTTP server itself). It fetches articles, scores them via AI, and writes the results to a relational database.
The HTTP Server is a read-mostly API server. It reads persisted scores from the database and serves them to the frontend in a structured JSON format.
The Database is the single source of truth. It stores every pipeline run, every snapshot, every article, every article weight, and the full scoring history.
External Scheduler (every 6 hours)
│
▼
POST /internal/pipeline/run
│
▼
Pipeline Execution
┌─────────────────────────────────────────┐
│ 1. Fetch articles (3 providers) │
│ 2. Normalize + deduplicate │
│ 3. Supplementary fetch (thin cats.) │
│ 4. Enrich article content │
│ 5. Build AI News sidecar │
│ 6. Synthesize via LLM │
│ 7. Compute top-level score │
│ 8. Write to database │
└─────────────────────────────────────────┘
│
▼
Relational Database (PostgreSQL)
│
▼
HTTP API → Frontend Dashboard3. The Nine Anxiety Categories
The system measures anxiety across exactly nine political domains. These are fixed in the schema and the scoring algorithm. Each has a defined scope:
Politics covers electoral dynamics, policy, legislation, partisan shifts, congressional activity, and campaigns.
Culture covers social movements, identity discourse, media dynamics, misinformation, and social cohesion issues.
International Affairs covers diplomatic relations, trade agreements, alliances, bilateral relations, and multilateral agreements.
Global Security covers military actions, defense posture, nuclear concerns, cyber threats, and armed conflicts.
Health covers public health policy, healthcare access, pandemic response, and healthcare system impacts.
Business covers economic policy, markets, corporate regulations, trade policy, labor dynamics, and inflation.
Climate covers climate policy, environmental regulation, energy transition, disasters, and emissions.
Education covers education policy, schools and universities, student rights, curriculum debates, and academic funding.
Science & Technology covers science policy, technology regulation, AI and data governance, research funding, and platform policy.
These nine categories are stored internally using snake_case identifiers (e.g., international_affairs) and exposed externally via the API using kebab-case identifiers (e.g., international-affairs).
4. The Anxiety Score: What It Actually Measures
Before diving into mechanics, it is critical to understand what the anxiety score represents — because the intuition is counterintuitive.
Anxiety reflects sentiment valence, not topic volume.
A heavily-covered positive development produces a low score. A lightly-covered credible threat produces a high score. The system is measuring emotional signal, not headline count.
The scale runs from 0 to 100:
- 0–20: Very low anxiety. Broadly positive or stable signals.
- 20–40: Low anxiety. Mostly calm, minor concerns.
- 40–60: Moderate anxiety. Mixed signals, uncertainty.
- 60–80: Elevated anxiety. Significant concern, active tensions.
- 80–100: High anxiety. Crisis signals, existential concerns.
The conversion from news sentiment to anxiety is mathematically defined. News sentiment scores (which run from −1.0 for maximally negative to +1.0 for maximally positive) map to anxiety as:
anxietyScore = (1 − (sentimentScore + 1) / 2) × 100This means a perfectly positive sentiment (+1.0) maps to anxiety 0, a neutral score (0.0) maps to 50, and a maximally negative sentiment (−1.0) maps to 100. This inversion is intentional: good news produces low anxiety, bad news produces high anxiety.
5. Stage One: Data Ingestion
Every pipeline run begins by fetching news articles from three complementary providers. These are run in parallel using Promise.allSettled, which means if one provider fails, the pipeline continues with the remaining two rather than aborting entirely.
Provider A — Aggregate Signal Provider (Free/Public)
This provider is queried for its pre-computed tone and sentiment signals about news events. It does not provide full article text. Its value is the pre-computed emotional tone it attaches to events, which saves the LLM from having to re-derive sentiment from raw text. Tone values are normalized from a −100 to +100 scale to a −1.0 to +1.0 scale before being stored. Articles are filtered to English-language content only.
This provider has a maximum of 15 articles per request (a hard upstream limit). It also has rate-limiting behavior, which the system handles with a circuit breaker: if too many consecutive rate-limit errors occur, requests are paused for a cooldown period before retrying.
Provider B — Article Breadth Provider
This provider supplies article titles, URLs, descriptions, and publication timestamps. It does not supply pre-computed sentiment. Its value is breadth and diversity — it covers a wide range of English-language news sources and provides article descriptions that give the LLM additional context.
The lack of pre-computed sentiment means the LLM must infer anxiety from the article title and description alone when Provider B is the only source for a given article. This is acceptable — titles and descriptions are often sufficient to assess valence.
Provider C — Enrichment Provider
This provider is the highest-value source because it supplies VADER sentiment scores pre-computed per article. These scores are on the −1.0 to +1.0 scale and can be fed directly to the LLM as structured input, dramatically reducing the number of tokens the LLM needs to consume.
Provider C also supplies article body text (used as a long description), source names, and publication timestamps. It has circuit-breaker logic: if it returns zero results across several consecutive runs, requests are paused for a cooldown period.
The Query Strategy
All three providers are queried with the same core keyword set targeting political coverage. The primary query looks for articles mentioning key political figures and institutions associated with the current U.S. administration.
Each provider is asked to return articles published within the last 6 hours (or 24 hours if the system is using an expanded lookback window). Articles are sorted by publication date descending, and a configurable maximum per source is applied (defaulting to 50 per provider for Providers A and B, 30 for Provider C).
Failure Handling
If a provider fails (network timeout, rate limit, API error), the pipeline logs the failure and continues with whatever articles were successfully fetched from the other providers. The failed provider's fetch is recorded in the database as a source fetch summary with status: 'failed'. The final pipeline run is not marked as failed unless all providers fail and no articles are available.
6. Stage Two: Normalization
Every provider returns articles in a different shape. Before any further processing, all articles are converted into a common internal format called NormalizedArticle. This format contains:
- A content fingerprint: A SHA-256 hash of the article's title, source name, and publication hour (not minute — the hour-level granularity allows the same story published at slightly different times across providers to share a fingerprint).
- Title and description: The description is stripped of HTML and truncated to 300 characters. This is exactly what gets sent to the LLM — the system never stores or sends full article text.
- URL: Used for deduplication and as a reference link in the API output.
- Source name: The news outlet name.
- Publication date: Normalized to UTC.
- Sentiment score: A float between −1.0 and +1.0 if pre-computed by the provider, otherwise
null. - Sentiment source: Which provider computed the sentiment (
provider_a,provider_c, ornull). - Political bias: An inferred political lean (left, center, right) based on the outlet's source name, used in coverage metrics.
- Article content: A structured content object (see Stage Four).
Articles that lack a title or URL are discarded. Articles with titles containing foreign-language indicators (non-Latin scripts, foreign stopwords) are discarded to keep the corpus English-language only.
7. Stage Three: Deduplication
After normalization, all articles from all providers are merged into a single pool and deduplicated in two passes.
Pass One — URL Deduplication: Articles sharing the same URL (case-insensitive, query parameters stripped) are collapsed into one. When two articles share a URL, the one with richer metadata is kept. Richness is ranked in this order: article with extracted full text > article with a long summary > article with sentiment score and description > article with description only > article with title and URL only.
Pass Two — Fingerprint Deduplication: Articles sharing the same SHA-256 content fingerprint (same title + source + hour of publication) are collapsed using the same richness ranking. This catches cases where the same story is published on slightly different URLs across different provider results.
The result of deduplication is a single ordered list of unique articles, sorted by publication date descending.
8. Stage Three-B: Supplementary Fetching
After the primary fetch and deduplication, the system estimates how many articles it has for each of the nine categories. It does this by scanning article titles and descriptions for category-specific keywords. If a category falls below a minimum article threshold (which varies by category — politics needs at least 5, climate needs at least 2, for example), a targeted supplementary fetch is triggered for that category.
The supplementary fetch queries one of the providers (preferring the enrichment provider) with a category-specific keyword combination layered on top of the core political query. For example, a thin Health category might trigger a query like: (trump OR "donald trump" OR "trump administration") AND (health OR FDA OR vaccine OR Medicaid OR Medicare).
If the supplementary fetch for the standard 6-hour window still produces too few articles, the system retries with an expanded 24-hour lookback window. Articles fetched via an expanded window are later flagged with a stale_24h scoring tier, which reduces their confidence contribution.
The supplementary results are deduplicated against the primary pool and appended. Only categories that were actually expanded with 24-hour-window articles get the stale_24h tier marking.
9. Stage Four: Article Content Enrichment
After the full deduplicated article pool is assembled, the system attempts to fetch the full text of each article by visiting its source URL. This is done with a concurrency limit (defaulting to 5 simultaneous fetches) and a per-article timeout (defaulting to 5 seconds).
This stage is purely additive and never fails the pipeline. If an article's URL times out, returns a non-200 response, or returns non-HTML content, the article simply retains whatever content it already has from the provider's description.
Security note: Before fetching any URL, the system checks whether the target IP address is a private, loopback, or link-local address. URLs resolving to RFC-1918 private address ranges, loopback addresses, or link-local addresses are silently skipped. This prevents server-side request forgery (SSRF) attacks where a malicious article URL could cause the server to make requests to internal infrastructure.
When full HTML is successfully retrieved, the system extracts the page text, removes scripts, styles, and HTML tags, and produces a structured ArticleContent object. This object has one of two types:
`extracted_article`: The full article body text was successfully retrieved and is long enough (500+ characters) to be useful. The content is stored as an array of paragraphs.
`long_summary`: The full text was unavailable or too short, but the system has enough information from the title, source attribution, and description to assemble a useful summary. This is stored in the same paragraph-array format.
If neither is possible (e.g., no description and no fetchable content), articleContent is null.
This enriched content serves two purposes: it is stored in the database as evidence for human review, and it is what the frontend displays in article modals.
10. Stage Five: The AI News Sidecar
Parallel to the main scoring pipeline, the system assembles a separate "AI News" section. This section is purely informational — it has no effect on any anxiety score, confidence value, or category assignment. It is a curated list of AI-related news articles displayed alongside the scores.
The AI News fetch uses the same provider infrastructure but with a completely different keyword set targeting general artificial intelligence coverage (model releases, chip demand, regulatory activity, etc.).
After fetching and deduplicating AI News candidates, the system removes any articles that already appear in the main scoring pool (by URL or fingerprint), since category/scoring content takes priority.
The remaining candidates are sent to the LLM in a separate prompt that asks for a 1–2 sentence neutral summary of AI news and a ranked list of up to five article IDs. If the LLM call fails, the system falls back to a deterministic headline digest constructed from the candidates' titles — it does not abort the scoring run.
The AI News section is stored separately in the database and returned as a distinct field in the API response.
11. Stage Six: LLM Synthesis
This is the heart of the system. The deduplicated, enriched article pool is sent to an AI language model in a structured prompt. The LLM's job is to:
1. Assign each article to one to four categories with weights that sum to exactly 1.0.
2. Compute a per-category anxiety score (0–100), confidence level (0.0–1.0), and article count.
3. Write a 1–3 sentence reasoning string for each category.
4. Flag any category where news sources show significant disagreement about anxiety direction.
The Prompt Contract
The system prompt establishes the LLM as a political sentiment analyst and provides precise rules. Key rules include:
- Anxiety reflects sentiment valence, not topic volume.
- Each article must be assigned to 1–4 categories with weights summing to 1.0.
- Confidence reflects source agreement and coverage breadth.
- A disagreement flag is created when the majority-side split ratio is 0.70 or lower (i.e., sources are at most 70/30 split on anxiety direction).
- The output must be strict JSON — no markdown fences, no prose.
The user prompt includes all articles serialized as JSON, with each article's ID, title, description, source name, and pre-computed sentiment score (if available).
Batching
If the deduplicated article count exceeds 100, the system batches articles in groups of up to 25 per LLM call. Results are merged by averaging category scores weighted by each batch's article count contribution.
Response Validation and Repair
The LLM response is parsed and validated strictly. If the JSON is malformed, the system makes a second LLM call — a "repair pass" — asking it to fix the JSON. If the repair also fails, and the batch is large enough to split, the system recursively splits the batch in half and tries each half independently.
After parsing, the system enforces several invariants:
- Scores are clamped to [0, 100].
- Confidence values are clamped to [0.0, 1.0].
- Article weights are normalized to sum to 1.0 if they don't (within a ±0.01 tolerance).
- Any article from the input batch that the LLM failed to include in
articleWeightsis assigned a fallback weight template derived from the top-scoring categories. - If a category is missing entirely from the LLM response, it receives a score of 0, confidence of 0, and a scoring tier of
insufficient.
Token Efficiency
A key design principle is minimizing LLM token usage. By using pre-computed sentiment scores from Provider C and tone signals from Provider A as inputs to the prompt, the LLM does not need to re-read raw article text to determine sentiment. It synthesizes existing signals rather than extracting them from scratch. This reduces token usage by approximately 60–70% compared to an approach that sends full article text.
12. Stage Seven: Score Computation
Per-Category Scores
Each category score that comes back from the LLM is an anxietyScore on [0, 100]. These are not further transformed at this stage — they are the LLM's direct assessment of anxiety level based on available signal.
Confidence
Confidence (0.0–1.0) reflects two factors the LLM is asked to assess:
Source agreement: If sources across the political spectrum broadly agree on the emotional tone of a category's coverage, confidence is high. If they diverge sharply (e.g., some framing an event as a win, others as a crisis), confidence is low.
Coverage breadth: Categories with more contributing articles (weighted by their category assignment weights) have higher confidence than categories with only one or two articles.
These two factors are combined as min(agreementFactor, coverageFactor) — the limiting factor determines the ceiling.
Disagreement Flags
A disagreement flag is attached to any category where the source sentiment split exceeds a threshold. Specifically, a flag is created when the majority side represents 70% or less of sources (i.e., a 70/30 or closer split).
Flags have two severity levels:
- `even-split`: The majority side is ≤70% of sources.
- `sharp-split`: The majority side is ≥90% of sources. This seems counterintuitive — a sharp flag at 90% means there is a small but meaningful minority taking a strongly opposite view on something significant.
Each flag includes a one-liner explanation naming the specific issue causing the split and the approximate split ratio.
Top-Level Score
The top-level aggregate score is computed as a confidence-weighted average of all nine category scores:
topLevelScore = Σ(category.score × category.confidence) / Σ(category.confidence)This means a high-confidence category (say, confidence = 0.9) exerts nearly twice as much influence on the top-level score as a low-confidence one (say, confidence = 0.5). Categories with very sparse coverage do not distort the overall signal.
The top-level confidence is the simple average of all nine category confidences.
Score Labels
The top-level score (and implicitly the category scores) maps to a qualitative label:
| Score Range | Label |
|---|---|
| 0–24 | low |
| 25–49 | elevated |
| 50–74 | high |
| 75–100 | severe |
Quiet Period Detection
The system marks a snapshot as a "quiet period" when five or more of the nine categories are using non-fresh fallback scoring tiers (see Stage Eight). This signals to users that the current scores are carrying forward historical data rather than fresh signal, and the dashboard adjusts its presentation accordingly.
13. Stage Eight: Fallback Scoring
Not every pipeline run produces fresh, high-confidence signal for every category. Political news is uneven — some categories may have thin coverage in any given six-hour window. The fallback scoring system handles this gracefully by applying progressively degraded estimates rather than showing misleading zeroes or refusing to display a score.
Scoring Tiers
Every category score has a scoringTier field that communicates how the score was derived:
`fresh`: The score was computed from articles in the current primary window with sufficient coverage and confidence. This is the ideal state.
`stale_24h`: The score was computed from articles within an expanded 24-hour lookback window (or from supplementary fetches that used the expanded window). Confidence is capped at 0.6 and reduced by 25% to reflect the staleness.
`carry_forward`: No usable fresh signal exists, but there are credible recent historical scores within the last 48 hours. The system blends the most recent credible score with the 30-day historical average (70% recent, 30% historical).
`decayed`: The most recent credible score is more than 48 hours old. The score is decayed exponentially toward the 30-day historical average using a half-life of 24 hours. The further back the last fresh score, the closer the current estimate approaches the historical mean.
`insufficient`: No usable recent or historical signal exists at all. The score is set to 45 (a neutral moderate baseline) with a low confidence of 0.25. This is shown with appropriate caveats in the UI.
How Fallback Is Applied
After the LLM returns category scores, each category is independently evaluated through the fallback logic. The system queries recent historical scores for that category from the database, computes a 30-day historical average, and then decides which tier applies.
If the raw LLM score is reliable (sufficient articles, sufficient confidence), it passes through unchanged with a fresh tier. If not, the fallback system produces a blended or decayed estimate, updates the reasoning text to explain the situation, and assigns the appropriate tier.
This means the pipeline always produces a complete nine-category snapshot, even in news deserts.
14. Stage Nine: Database Persistence
Everything is written to the relational database in a single atomic transaction. If any write fails, the entire transaction rolls back and the pipeline run is marked failed. There are no partial writes.
The write order within the transaction is:
1. Update the pipeline_runs row to completed, recording token usage, article counts, and timing.
2. Insert one score_snapshots row with the top-level aggregate score and confidence.
3. Insert nine category_scores rows — one per category — including the score, confidence, reasoning text, article count, and scoring tier.
4. Insert zero or more disagreement_flags rows for any categories where source disagreement was detected.
5. Insert N articles rows — one per deduplicated article that contributed to the scoring pool — including the title, URL, source, description, article content, sentiment score, and political bias.
6. Insert M article_category_weights rows — the LLM's per-article category weight assignments, stored sparsely (only assigned categories are stored).
7. Insert one ai_news_sections row and up to five ai_news_articles rows for the AI News sidecar.
The score_snapshots table has a unique constraint on pipeline_run_id, preventing duplicate snapshots from retry scenarios. The articles table has a unique constraint on (pipeline_run_id, fingerprint), preventing duplicate articles.
Full article text is intentionally never stored. Only descriptions (truncated to 300 characters) are persisted. This is the exact text that was sent to the LLM.
15. Pipeline Run Lifecycle
Every pipeline execution creates a pipeline_runs record in the database at startup with a running status. This record serves as:
- A lock signal: if a run is already in progress (checked via
getRunningPipelineRun), new runs are rejected with a 503 error. - A stale-run safety mechanism: if a run is still marked
runningafter 20 minutes, the system automatically marks itfailedand allows the next run to proceed.
The run record tracks: articles fetched before deduplication, articles after deduplication (what the LLM actually received), total LLM tokens consumed, and a per-stage timing breakdown.
16. The Scheduling System
The pipeline does not run on an internal timer. An external scheduling system (a CI/CD workflow) triggers the pipeline every 30 minutes by checking the timestamp of the last completed pipeline run from the health endpoint. If the last run was more than six hours ago, it sends a signed POST request to the internal pipeline trigger endpoint. Otherwise, it skips.
This approach decouples the scheduling concern from the application process, making the application stateless from the scheduler's perspective. The six-hour cadence targets four daily runs: midnight, 6am, noon, and 6pm in a reference timezone.
Manual runs are always available via the same internal endpoint, protected by a timestamp-validated HMAC signature.
17. The API Layer
The HTTP server exposes a small, read-mostly API consumed exclusively by the frontend. All routes are GET except the two internal POST endpoints.
GET /scores/current
Returns the most recent completed snapshot in the frontend-facing format. This response includes:
- The top-level score with delta from the previous snapshot, a qualitative label, a generated narrative summary paragraph, and a quiet-period flag.
- All nine category cards, each with a score (integer 0–100), confidence (integer percentage), reasoning text, disagreement display, the top-weighted news article for that category, the scoring tier, article count, and hours since last fresh data.
- Coverage metrics derived from the articles in the snapshot: story count, outlet count, coverage percentage, and window hours.
- Source balance metrics showing the political bias distribution of contributing outlets (left/center/right/unknown as percentages), computed over distinct outlets, not article volume.
- The AI News section with a summary and up to five relevant articles.
The topNews field for each category is the article with the highest weight for that category. If two articles tie, the newest one wins. The same article cannot appear as topNews for two different categories — a deduplication pass ensures each article is used at most once.
Cache headers on this endpoint are set to s-maxage=60, stale-while-revalidate=300, allowing CDN caching while keeping the data reasonably fresh.
GET /scores/history
Returns historical score data points for charting. Accepts window (7d, 30d, 90d, all) and categoryId (any public category ID or top-level) query parameters. Optionally returns all ten series (one per category plus top-level) in a single response when includeAll=true, enabling the frontend to toggle between trend lines without additional requests.
Spike detection is applied to history data points: a data point is flagged as a spike if the score increased by 25% or more relative to the previous point, or by at least 25 absolute points from a zero baseline.
GET /scores/:snapshotId
Returns full historical detail for any past snapshot, including the complete list of articles that contributed, their category weights, and the reasoning text for each category. This enables historical drill-down.
GET /health
Returns database connectivity status and the timestamp of the last completed pipeline run. The scheduler uses this endpoint to decide whether to trigger a new run.
POST /internal/pipeline/run
Triggers a pipeline run immediately. Protected by either a shared secret header or a timestamped HMAC signature (valid within a ±5 minute window). Returns the full pipeline run summary including per-category scores, per-source fetch statistics, and stage timing breakdowns.
POST /internal/logs/export
Triggers export of recent application logs to a log aggregation service. Protected by the same authorization mechanism as the pipeline trigger.
18. Coverage and Source Balance Metrics
A key transparency feature is the coverage and source balance metadata attached to each snapshot. These are derived from the articles that contributed to the scoring, not from any external source.
Coverage metrics answer: how many unique stories and outlets contributed, and what fraction of the total tracked outlets are represented?
storyCount: The number of distinct articles with a positive weight for this category.outletCount: The number of distinct news outlets contributing to this category.trackedOutletCount: The total number of distinct outlets across the entire snapshot (used as the denominator forfetchCoveragePct).fetchCoveragePct:outletCount / trackedOutletCount × 100. For the top-level, this is always 100% (by definition, the union of all outlets is the denominator).windowHours: The maximum lookback hours used in the scoring fetches for this snapshot (usually 6, or 24 if expanded).
Source balance metrics answer: what is the political bias distribution of contributing outlets?
These are computed over distinct outlets, not over article volume. An outlet that published five articles counts once. If an outlet's bias is unknown (not in the built-in bias registry), it contributes to unknownPct. The four percentages (left, center, right, unknown) are calculated using a largest-remainder rounding algorithm to ensure they sum exactly to 100.
19. Observability and Logging
Every significant pipeline and API event is recorded in the database in an app_logs table. Each log record has a stable fingerprint derived from the event type, request ID, method, path, status code, pipeline run ID, message, and error message. Duplicate fingerprints are silently discarded, preventing log flooding from rapid retries.
Logs are periodically exported to an external log aggregation service via a scheduled workflow that calls the POST /internal/logs/export endpoint. The export marks records as exported to prevent double-sending.
Pipeline runs emit structured JSON logs at each stage with timing, article counts, token usage, and error details. These logs are also persisted to the database for operational review.
20. Data Retention and Idempotency
All data is append-only. The pipeline never updates or deletes scoring data in normal operation. Historical snapshots, articles, and weights are retained indefinitely.
The pipeline is designed to be idempotent: running it twice in the same six-hour window with the same provider responses would produce the same result. The unique constraint on (pipeline_run_id) in score_snapshots prevents accidental duplicate snapshots from retry scenarios.
Article fingerprints are content-addressed: the same article fetched from multiple providers in multiple runs will always produce the same fingerprint, making deduplication reliable across runs.
21. Replacing Any Vendor
The system is designed so that any of the three news providers, the LLM provider, or the database provider can be replaced with minimal changes. Here is what each replacement involves:
Replacing a news provider: Write a new fetcher module that implements the SourceFetcher interface. The interface requires a sourceName string and a fetch method that accepts a context object (config, timestamp, lookback hours, optional category filter) and returns NormalizedArticle[]. Register the new fetcher in the dependency setup. No other code changes are required.
Replacing the LLM provider: The AI provider client module wraps the LLM API call. Replace the client with any provider that accepts a system prompt, a user message, and returns text. The response parsing logic (which validates and normalizes the JSON output) is independent of the provider. The prompt contract defines what the LLM must return; the provider just needs to be capable of following structured JSON instructions reliably.
Replacing the database: The BackendStore interface defines all database operations as abstract method signatures. Replace the PostgresStore implementation with any implementation of this interface (e.g., a different SQL dialect, a hosted database service with a compatible driver) and update the connection configuration. Schema migrations would need to be adapted, but the application layer requires no changes.
Replacing the deployment platform: The application exposes a standard Node.js HTTP server and a Vercel-compatible serverless handler. The core logic is platform-agnostic.
22. End-to-End Data Flow Summary
Here is the complete journey of data from external sources to the API response:
1. The scheduler checks the health endpoint and determines a new run is due.
2. The scheduler sends a signed POST to /internal/pipeline/run.
3. The pipeline creates a running record in the database.
4. Three provider fetches run in parallel. Results are normalized to NormalizedArticle[].
5. All articles are merged and deduplicated by URL then by fingerprint.
6. Thin categories trigger supplementary fetches, potentially with an expanded time window.
7. Article content is fetched from source URLs (with SSRF protection and timeouts).
8. AI News articles are fetched, deduplicated against the scoring pool, and sent to the LLM separately.
9. Scoring articles are sent to the LLM in batches of up to 25. The LLM returns category scores, confidence values, reasoning text, disagreement flags, and per-article category weights.
10. The fallback scoring system applies to any category with insufficient fresh signal, producing carry-forward, decayed, or insufficient estimates as appropriate.
11. The top-level score is computed as a confidence-weighted average.
12. All data is written to the database in a single atomic transaction.
13. The pipeline run record is updated to completed.
14. When the frontend requests /scores/current, the HTTP server reads the latest snapshot, enriches it with fallback-scored categories, computes coverage and source balance metrics, and returns the fully shaped response.
23. Key Design Principles
Separation of concerns: The scheduler, the pipeline, the database, and the HTTP server are each responsible for exactly one thing. No component reaches into another's domain.
Failure isolation: A single provider failure does not abort the pipeline. A scoring failure does not blank out a category. A missing AI News response does not fail the scoring run.
Transparency: Every score comes with a scoring tier that communicates how it was derived. Every response comes with coverage and source balance metadata. Every historical snapshot is queryable with its full article evidence trail.
Token efficiency: Pre-computed sentiment signals from providers reduce LLM token consumption significantly. The LLM synthesizes existing signal rather than re-deriving it from raw text.
Idempotency: Re-running the pipeline produces consistent results. Duplicate writes are prevented by database constraints. Log deduplication prevents record flooding.
Auditability: Every pipeline run, every article, every weight assignment, and every scoring decision is persisted to the database. Human review of any historical score is always possible.
This document describes the system as of the date it was written. The specific vendors, models, and API keys in use at any given time are configuration concerns and do not affect the accuracy of this document.