LinkedIn's feed reaches greater than 1.3 billion members — and the structure behind it hadn't saved tempo. The system had accrued 5 separate retrieval pipelines, every with its personal infrastructure and optimization logic, serving completely different slices of what customers would possibly need to see. Engineers on the firm spent the final yr tearing that aside and changing it with a single LLM-based system. The end result, LinkedIn says, is a feed that understands skilled context extra exactly and prices much less to run at scale.
The redesign touched three layers of the stack: how content material is retrieved, the way it's ranked, and the way the underlying compute is managed. Tim Jurka, vp of engineering at LinkedIn, instructed VentureBeat the crew ran tons of of checks over the previous yr earlier than reaching a milestone that, he says, reinvented a big chunk of its infrastructure.
“Ranging from our complete system for retrieving content material, we've moved over to utilizing actually large-scale LLMs to grasp content material way more richly on LinkedIn and have the ability to match it a lot in a way more customized method to members,” Jurka mentioned. “All the best way to how we rank content material, utilizing actually, actually giant sequence fashions, generative recommenders, and mixing that end-to-end system to make issues way more related and significant for members.”
One feed, 1.3 billion members
The core problem, Jurka mentioned, is two-sided: LinkedIn has to match members' acknowledged skilled pursuits — their title, abilities, {industry} — to their precise habits over time, and it has to floor content material that goes past what their speedy community is posting. These two indicators incessantly pull in several instructions.
Individuals use LinkedIn in several methods: some look to attach with others of their {industry}, others prioritize thought management, and job seekers and recruiters use it to seek out candidates.
How LinkedIn unified 5 pipelines into one
LinkedIn has spent greater than 15 years constructing AI-driven advice methods, together with prior work on job search and other people search. LinkedIn’s feed, the one which greets you if you open the web site, was constructed on a heterogeneous structure, the corporate mentioned in a weblog put up. Content material fed to customers got here from numerous sources, together with a chronological index of a consumer’s community, geographic trending subjects, interest-based filtering, industry-specific content material, and different embedding-based methods.
The corporate mentioned this methodology meant every supply had its personal infrastructure and optimization technique. However whereas it labored, upkeep prices soared. Jurka mentioned utilizing LLMs to scale out its new advice algorithm additionally meant updating the encircling structure across the feed.
“There’s rather a lot that goes into that, together with how we preserve that sort of member context in a immediate, ensuring we offer the correct knowledge to hydrate the mannequin, profile knowledge, current exercise knowledge, and many others,” he mentioned. “The second is the way you truly pattern probably the most significant sort of knowledge factors to then fine-tune the LLM.”
LinkedIn examined completely different iterations of the info combine in an offline testing setting.
Considered one of LinkedIn’s first hurdles in revamping its retrieval system revolved round changing its knowledge into textual content for LLMs to course of. To do that, LinkedIn constructed a immediate library that lets them create templated sequences. For posts, LinkedIn targeted on format, writer info, engagement counts, article metadata, and the put up's textual content. For members, they integrated profile knowledge, abilities, work historical past, schooling and “a chronologically ordered sequence of posts they’ve beforehand engaged with.”
One of the vital consequential findings from that testing section concerned how LLMs deal with numbers. When a put up had, say, 12,345 views, that determine appeared within the immediate as "views:12345," and the mannequin handled it like every other textual content token, stripping it of its significance as a reputation sign. To repair this, the crew broke engagement counts into percentile buckets and wrapped them in particular tokens, so the mannequin might distinguish them from unstructured textual content. The intervention meaningfully improved how the system weighs put up attain.
Instructing the feed to learn skilled historical past as a sequence
After all, if LinkedIn needs its feed to really feel extra private and posts attain the correct viewers, it must reimagine the way it ranks posts, too. Conventional rating fashions, the corporate mentioned, misunderstand how folks interact with content material: that it isn’t random however follows patterns rising from somebody’s skilled journey.
LinkedIn constructed a proprietary Generative Recommender (GR) mannequin for its feed that treats interplay historical past as a sequence, or “an expert story instructed by means of the posts you’ve engaged with over time.”
“As an alternative of scoring every put up in isolation, GR processes greater than a thousand of your historic interactions to grasp temporal patterns and long-term pursuits,” LinkedIn’s weblog mentioned. “As with retrieval, the rating mannequin depends on skilled indicators and engagement patterns, by no means demographic attributes, and is often audited for equitable therapy throughout our member base.”
The compute value of working LLMs at LinkedIn's scale
With a revitalized knowledge pipeline and feed, LinkedIn confronted one other drawback: GPU value.
LinkedIn invested closely in new coaching infrastructure to scale back how a lot it leans on GPUs. The most important architectural shift was disaggregating CPU-bound characteristic processing from GPU-heavy mannequin inference — conserving every sort of compute doing what it's suited to relatively than bottlenecking on GPU availability. The crew additionally wrote customized C++ knowledge loaders to chop the overhead that Python multiprocessing was including, and constructed a customized Flash Consideration variant to optimize consideration computation throughout inference. Checkpointing was parallelized relatively than serialized, which helped squeeze extra out of obtainable GPU reminiscence.
“One of many issues we needed to engineer for was that we wanted to make use of much more GPUs than we’d wish to,” Jurka mentioned. “Being very deliberate about the way you coordinate between CPU and GPU workloads as a result of the good factor about these sorts of LLMs and immediate context that we use to generate embeddings is you’ll be able to dynamically scale them.”
For engineers constructing advice or retrieval methods, LinkedIn's redesign affords a concrete case examine in what changing fragmented pipelines with a unified embedding mannequin truly requires: rethinking how numerical indicators are represented in prompts, separating CPU and GPU workloads intentionally, and constructing rating fashions that deal with consumer historical past as a sequence relatively than a set of impartial occasions. The lesson isn't that LLMs remedy feed issues — it's that deploying them at scale forces you to unravel a special class of issues than those you began with.

