How Meta is creating customized silicon for AI


Olivia Wu, Meta’s Technical Lead for Infra Silicon, discusses the design and growth of Meta’s first-generation AI inference accelerator.

With the latest launches of MTIA v1,  Meta’s first-generation AI inference accelerator, and Llama 2,  the subsequent technology of Meta’s publicly accessible giant language mannequin, it’s clear that Meta is targeted on advancing AI for a extra linked world. Fueling the success of those merchandise are world-class infrastructure groups, together with Meta’s customized AI silicon workforce, led by Olivia Wu, a pacesetter within the silicon business for 30 years.

Within the dialog under, Olivia explains how she led the silicon design workforce to ship Meta’s AI silicon, permitting the corporate to enhance the compute effectivity of the infrastructure, and allow software program builders to create AI fashions that may present extra related content material and higher person experiences.

Inform us about your function at Meta.

Olivia Wu: I lead design growth of the subsequent technology of Meta’s AI silicon. My workforce is chargeable for the design and growth of Meta’s in-house machine studying (ML) accelerator, and I associate intently with our co-design, structure, verification, implementation, emulation, validation, system, firmware, and software program groups to efficiently construct and deploy the silicon in our knowledge facilities.

What led you to this function?

OW: I’ve been working within the silicon business for 30 years and have expertise working at quite a lot of giant firms main each structure and design for a number of ASICs and IPs, and for startups centered on coaching AI. In 2018, I noticed a social media put up from Yann LeCun, our Chief AI Scientist, that Meta was in search of somebody to assist construct AI silicon in-house. I knew of only a few different firms designing their very own customized AI silicon, however they had been primarily centered solely on silicon and never the software program ecosystem and merchandise. 

The chance for Meta (often called Fb again then) was to usher in silicon builders to work immediately with the software program groups to reimagine end-to-end techniques permitting for higher effectivity and bigger levels of freedom in optimizing throughout {hardware} and software program boundaries. 

This was very engaging to me. I knew this was a uncommon alternative and I needed to leap on it to have the prospect to construct a design workforce from the bottom up. 

How was the transition from working at two totally different startups to working at Meta?

OW: My transition from startup to Meta was tremendous simple. We had a really small workforce, so it virtually seems like a startup inside a big firm. I used to be in a position to get entangled in lots of elements of the mission. It gave me the chance to be very hands-on in all facets of ASIC growth.

Meta additionally has a really open tradition. The liberty to innovate and experiment with new concepts is ingrained into Meta’s DNA. I used to be in a position to have whiteboard classes with members of co-design, software program, {hardware}, and different cross-functional groups to brainstorm options that might go into the silicon. These discussions gave me lots of insights into Meta’s crucial AI workloads, the challenges that our software program groups had encountered with the present options, and their future instructions. Coming from a startup, the place we had very restricted visibility into buyer workloads and the roadmap outdoors of what’s open sourced, this was very enlightening and refreshing, 

What are among the challenges you face in your present function?

OW: The silicon growth cycle usually is pretty lengthy. It often spans anyplace from one  and a half to 2 years, although it might take so long as 4 years in some circumstances. With AI advancing at a a lot quicker clip, we’re actually designing {hardware} for software program that doesn’t but exist. So the silicon has to have the ability to deal with not simply the calls for of AI right now, however future AI as nicely. To do that, we’ve to know what our software program workforce wants – AI workload developments they see, options they’ll want – and incorporate that into our design. 

That is the place we at Meta have a bonus. As a result of our silicon and software program groups are each in-house, we’ve a entrance row seat into what’s taking place in software program, and we’re in a position to incorporate it into our silicon from the start.

MTIA v1 was the very first silicon that we constructed at Meta, so one of many actually difficult issues was having to construct out the complete design and verification move from scratch, in addition to the silicon growth infrastructure itself. This was lots of work at first, but it surely’s actually paid off in the long term for the workforce.

Meta introduced MTIA v1 earlier this yr. What’s the significance of this milestone to you and the corporate?

OW: MTIA v1 is Meta’s first technology ML  accelerator. It’s custom-made for our deep studying suggestion mannequin, which is a crucial part for Meta applied sciences – together with Fb, Instagram, WhatsApp, Meta Quest, Horizon Worlds, and Ray-Ban Tales. Whereas we’ll proceed to buy silicon chips from our companions, designing our personal silicon permits us to optimize particularly for our crucial workloads and acquire full management over the complete stack – from silicon, to the system, to software program and the applying.

This was such a enjoyable and distinctive expertise, particularly after I first began and the workforce was actually, actually small. We had been in a position to match right into a convention room together with the software program workforce and whiteboard all of the totally different concepts and options we wished to implement. I don’t suppose I’ve ever had that type of expertise anyplace else. Though the workforce has grown fairly a bit since then, we nonetheless attempt to preserve that scrappy tradition.

What did you and the workforce be taught from this course of?

OW: I discovered how necessary it’s to have a hands-on workforce able to leaping into different roles to get the job executed. We function in some ways like a startup in that we’ve to put on many hats and tackle different challenges past our standard work. So regardless that I’m the design lead, along with main the mission growth, I additionally roll up my sleeves to code and assist out wherever is required.

What are you trying ahead to subsequent? What’s subsequent for the AI silicon design workforce?

OW: AI is central to our work at Meta. The advice system is clearly an enormous a part of our AI fashions, however past that, we even have GenAI and video processing use circumstances which have totally different necessities. This brings us lots of alternatives to create merchandise tailor-made for every want.

With MTIA in-house, it offers us an incredible quantity of learnings we are able to incorporate in our merchandise. As well as, we maintained the person expertise and developer effectivity provided by PyTorch eager-mode growth. Developer effectivity is a journey as we proceed to help PyTorch 2.0, which supercharges how PyTorch operates on the compiler degree — underneath the hood. We’re persevering with to assemble suggestions and enter from our AI software program groups to form the options of our future AI silicon.

As we work on the subsequent generations of MTIA chips, we’re continually bottlenecks within the system, resembling reminiscence and communication throughout totally different chips in order that we are able to put collectively a well-balanced answer to scale and future-proof our silicon.

What recommendation would possibly you give to ladies or different traditionally underrepresented teams concerned about pursuing a profession as engineers?

OW: I’d encourage them to actively take part and never shrink back from talking up in conferences or discussions so individuals can know what they will accomplish. The opposite factor is to search for mentors throughout the workforce. They don’t must be the identical as you. Having a mentor is at all times good, notably early in your profession, to assist information you and prioritize what is going to make it easier to advance. 

Meta’s Infra workforce, in addition to Meta extra broadly, has a mentor program for ladies engineers and underrepresented individuals. We provide each a bunch teaching program in addition to one-on-one teaching. I’ve executed each of those and actually get pleasure from having the chance to mentor. I’ve discovered that it’s very useful for junior engineers to have the chance to get teaching and mentoring from senior individuals within the firm.

What about Meta’s tradition and technical developments make it such a primary time for engineers, researchers, and builders to be on the firm?

OW: Meta is an amazingly open firm with a really collaborative tradition and an incredible place to be taught and develop. We offer assets to assist individuals shortly change into acquainted with the complete stack, even when they don’t have any prior publicity to sure elements. This contains every thing from the silicon to the firmware, the compiler, the applying, in addition to giant scale system design that we’re placing into the information middle. The sheer scale to which Meta has been deploying the applying additionally creates a dimension of challenges that makes it fascinating and rewarding to work right here.


Leave a Reply

Your email address will not be published. Required fields are marked *