Insighta Labs+ Case Study

WhyIBuiltThis

The HNG stage required a backend that could ingest a name and return demographic predictions by hitting three external APIs concurrently — Genderize, Agify, and Nationalize — then persist the result. The core challenge wasn't the happy path; it was making the system useful past basic CRUD: searchable by non-technical queries, exportable, and secure.

Natural-language search over structured data is normally handled by a dedicated search engine. The constraint here was PostgreSQL only — no full-text index tricks. The parser had to extract gender, age group, age range, and country from a raw query string and translate each into typed SQL filters, while handling edge cases like 'female' containing 'male' as a substring.

MyApproach

I built the auth layer around GitHub OAuth with PKCE, handling both browser and CLI flows from the same callback route by reading a mode param. Access tokens expire in 3 minutes; refresh tokens are hashed before storage, one-time-use, and rotated on every refresh call. The first authenticated user is promoted to admin automatically — no seed data or manual role assignment needed.

The natural-language search parser is a pure lexical system in nl-parsers.ts. It runs a fixed priority scan: gender check first (female before male to avoid substring collision), then age group via ternary chain, then age range shorthands, then regex-based above/below extraction, then a from-country regex that distinguishes ISO codes from full names by character length after compacting dots and spaces. All matched filters are AND-combined into a TypeORM QueryBuilder.

KeyFeatures

GitHub OAuth with PKCE supporting both browser and CLI client flows
Short-lived access tokens (3 min) with rotating, hashed, one-time-use refresh tokens
Role-based access control with admin and analyst roles; first user auto-promoted to admin
Natural-language profile search parsing gender, age group, age range, and country from raw query strings
Streaming CSV import with incremental row parsing, batched inserts, and per-row skip reasons
Request logging middleware capturing method, path, status, duration, user, IP, and timestamp

HowtheSystemWorks

A name submission triggers three concurrent external API calls; the result is persisted and immediately queryable via structured filters or natural-language search.

Client POSTs a name to /api/profiles — requires auth and X-API-Version: 1 header
Service fires fetchGender, fetchAge, and fetchNation concurrently via Promise.all
Any non-OK or empty upstream response throws a 502 identifying the failing API
Enriched profile is persisted to PostgreSQL via TypeORM; age group is assigned from Agify's age value at insert time
GET /api/profiles/search?q=... runs the lexical parser, builds a QueryBuilder with AND-chained WHERE clauses, and returns a paginated envelope with self/next/prev links
CSV import streams the uploaded file from Multer disk storage through csv-parser, processes rows in batches, skips malformed rows, and returns a summary of inserted vs skipped with reasons

EngineeringChallenges&LessonsLearned

The 'female' substring problem: includes() on 'male' matches inside 'female', so a query like 'show females' would incorrectly set gender to male if the check order was wrong. Fixed by always evaluating 'female'/'females' before 'male'/'males' in the conditional chain — documented explicitly so future contributors don't accidentally reorder it.

CSV imports with 50k rows couldn't be buffered in memory. Switched to Multer disk storage so the file lands on disk first, then the import endpoint reads it as a stream through createReadStream and pipes it into csv-parser. Rows are processed in batches and invalid ones are skipped rather than aborting the whole upload, with each skip reason tracked in the response summary.

WhatI’dImproveNext

Replace the lexical NL parser with a small LLM call or a proper NLP library — the current system can't handle negation, OR logic, or paraphrasing
Add a profile update endpoint so inferred data can be refreshed against the upstream APIs
Move from SQLite to a proper PostgreSQL instance for multi-process deployments

Insighta Labs+

Why I Built This

My Approach

Key Features

How the System Works

Engineering Challenges & Lessons Learned

What I’d Improve Next

Let’s build something exceptional together.