← ProjectsInsighta Labs+
All systems operational

Insighta Labs+

Built during HNG internship β€” a NestJS backend that enriches submitted names with demographic predictions, persists profiles in PostgreSQL, and exposes a natural-language search API with GitHub OAuth, RBAC, CSV import/export, and rate limiting.

NestJS
TypeScript
PostgreSQL
TypeORM
JWT
Insighta Labs+ demo preview

Why I Built This

WhyIBuiltThis

The HNG stage required a backend that could ingest a name and return demographic predictions by hitting three external APIs concurrently β€” Genderize, Agify, and Nationalize β€” then persist the result. The core challenge wasn't the happy path; it was making the system useful past basic CRUD: searchable by non-technical queries, exportable, and secure.

Natural-language search over structured data is normally handled by a dedicated search engine. The constraint here was PostgreSQL only β€” no full-text index tricks. The parser had to extract gender, age group, age range, and country from a raw query string and translate each into typed SQL filters, while handling edge cases like 'female' containing 'male' as a substring.

My Approach

MyApproach

I built the auth layer around GitHub OAuth with PKCE, handling both browser and CLI flows from the same callback route by reading a mode param. Access tokens expire in 3 minutes; refresh tokens are hashed before storage, one-time-use, and rotated on every refresh call. The first authenticated user is promoted to admin automatically β€” no seed data or manual role assignment needed.

The natural-language search parser is a pure lexical system in nl-parsers.ts. It runs a fixed priority scan: gender check first (female before male to avoid substring collision), then age group via ternary chain, then age range shorthands, then regex-based above/below extraction, then a from-country regex that distinguishes ISO codes from full names by character length after compacting dots and spaces. All matched filters are AND-combined into a TypeORM QueryBuilder.

Key Features

KeyFeatures

  • GitHub OAuth with PKCE supporting both browser and CLI client flows
  • Short-lived access tokens (3 min) with rotating, hashed, one-time-use refresh tokens
  • Role-based access control with admin and analyst roles; first user auto-promoted to admin
  • Natural-language profile search parsing gender, age group, age range, and country from raw query strings
  • Streaming CSV import with incremental row parsing, batched inserts, and per-row skip reasons
  • Request logging middleware capturing method, path, status, duration, user, IP, and timestamp

How the System Works

HowtheSystemWorks

A name submission triggers three concurrent external API calls; the result is persisted and immediately queryable via structured filters or natural-language search.

  1. Client POSTs a name to /api/profiles β€” requires auth and X-API-Version: 1 header
  2. Service fires fetchGender, fetchAge, and fetchNation concurrently via Promise.all
  3. Any non-OK or empty upstream response throws a 502 identifying the failing API
  4. Enriched profile is persisted to PostgreSQL via TypeORM; age group is assigned from Agify's age value at insert time
  5. GET /api/profiles/search?q=... runs the lexical parser, builds a QueryBuilder with AND-chained WHERE clauses, and returns a paginated envelope with self/next/prev links
  6. CSV import streams the uploaded file from Multer disk storage through csv-parser, processes rows in batches, skips malformed rows, and returns a summary of inserted vs skipped with reasons

Engineering Challenges & Lessons Learned

EngineeringChallenges&LessonsLearned

The 'female' substring problem: includes() on 'male' matches inside 'female', so a query like 'show females' would incorrectly set gender to male if the check order was wrong. Fixed by always evaluating 'female'/'females' before 'male'/'males' in the conditional chain β€” documented explicitly so future contributors don't accidentally reorder it.

CSV imports with 50k rows couldn't be buffered in memory. Switched to Multer disk storage so the file lands on disk first, then the import endpoint reads it as a stream through createReadStream and pipes it into csv-parser. Rows are processed in batches and invalid ones are skipped rather than aborting the whole upload, with each skip reason tracked in the response summary.

What I’d Improve Next

WhatI’dImproveNext

  • Replace the lexical NL parser with a small LLM call or a proper NLP library β€” the current system can't handle negation, OR logic, or paraphrasing
  • Add a profile update endpoint so inferred data can be refreshed against the upstream APIs
  • Move from SQLite to a proper PostgreSQL instance for multi-process deployments

YOU MADE IT THIS FAR

Let’s build something exceptional together.

Tell me about your project and I’ll help you shape it into a polished, high-performing experience.