GPT, from scratch
A transformer language model assembled from the raw parts — tokenizer, attention, training loop — in PyTorch. The goal was to understand every line.
Mancera1/gpt-from-scratchJose Mancera / Machine Learning Engineer
Machine-learning engineer at LinkedIn, working on video recommendations — vision-language models, retrieval, and content understanding.
01 About
Before any of this, I was a sergeant in the Marine Corps. It taught me to ignore the noise and look for the truth from first principles — which, it turns out, is most of engineering.
I left, studied computer science, and started building machine-learning systems. The work I care about lives where research meets production — models that have to be genuinely good and actually ship to real people.
The ML is the fun part, not the whole point. What I like is hard problems, the people they're for, and doing the unglamorous work well.
02 Toolkit
Less a list of tools, more the shape of the problems I like.
Ranking, embeddings, and the systems that decide what you see next.
Vision-language models and content understanding across video and text.
Distributed training and serving large models without the whole thing falling over.
Python and PyTorch, plus whatever the problem actually needs.
03 Path
Machine-learning engineer on video recommendations, in Mountain View.
A run of engineering internships across recommendation, infrastructure, and large-model serving.
An M.S. in progress at Notre Dame; a B.S. from Baylor before it.
Where it started. Out as a sergeant.
04 Selected Work
A few projects from outside the day job — built to learn, not to impress.
A transformer language model assembled from the raw parts — tokenizer, attention, training loop — in PyTorch. The goal was to understand every line.
Mancera1/gpt-from-scratchA RAG system that grounds a language model in a real document store and serves it on Kubernetes. Vector search in, sourced answers out.
Mancera1/rag-systemThe two above, wired together: my GPT is the generator inside my RAG — my tokenizer, my retrieval, my transformer, no external APIs — answering over a behavioural-data-science handbook in the demo below.
It's deliberately tiny and barely trained, and that's the point: it shows how the earliest GPTs actually began, and from these foundations you can see exactly where the later improvements come from.
Next steps aren't a better architecture — it's training on far more real data, on a GPU. I've only scratched the surface there, but the core idea is proven.