Hello all,
Thanks to everyone who attended our* Operate First Data Science Community
Meetup* earlier this week. It was great to see all the folks interested in
learning more and contributing to the current state of AI operations and
cloud infrastructure in an open way.
If you were unable to attend, you can find the presentation
<
https://github.com/aicoe-aiops/operate-first-data-science-community/blob/...
and
the recording <
https://youtu.be/rFaoM61dglM> of the meeting on our youtube
channel. Please feel free to contact us with any questions or suggestions
you may have regarding this or future meetups.
In this meetup, @Selbi Nuryyeva <selbi(a)redhat.com> from the Red Hat
Openshift Data Science team presented how to overcome the GPU hardware
limits of a single node training by taking advantage of GPUs on other
machines, and therefore, maximizing the utilization of GPUs in an open
cloud environment.
We look forward to seeing you at the next Meetup!
Best,
The Operate First Data Science Team
Links:
1. Calendar Invite:
https://calendar.google.com/calendar/u/2?cid=N3QyMm1ydm92amNmdTZqZm5ucDRu...
2. Topic suggestion:
https://github.com/aicoe-aiops/cloud-first-data-science-community/issues/...
3. Website:
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
4. Agenda:
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
5. Slack:
https://join.slack.com/t/operatefirst/shared_invite/zt-o2gn4wn8-O39g7sthT...
6. Mailing List:
https://lists.operate-first.cloud/admin/lists/community.lists.operate-fir...
7. Privacy Policy:
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
--
Thanks and Regards,
Aakanksha Duggal
She/Her
Software Engineer
Red Hat Boston <
https://www.redhat.com/>
AI Center of Excellence
Office of the CTO
aduggal(a)redhat.com
<
https://www.redhat.com/>
On Thu, Apr 14, 2022 at 2:11 PM Aakanksha Duggal <aduggal(a)redhat.com> wrote:
Hello all,
Please join us for another Operate First Data Science Community Meetup on
Tuesday, *April 19th, 2022 at 11:00 ET*. In this talk, @Selbi Nuryyeva
<selbi(a)redhat.com> will explain how one can distribute a machine learning
workflow across several nodes with GPU hardware in a cloud environment. She
will use PyTorch to carry out the ML training and Kubeflow, Node Feature
Discovery, and GPU operators to distribute the ML workload.
As the datasets and models get bigger, the demand for more powerful and
efficient GPUs is rapidly increasing. Oftentimes a single GPU is not
adequate for an ML use case. An alternative to upgrading the GPU hardware
is to distribute the ML workload either across several GPUs on one node, or
across multiple nodes each containing one or several GPUs. The ability for
the latter is especially preferred when a single machine can fit only so
many GPUs. The attendees will understand how to overcome the GPU hardware
limits of a single node training by taking advantage of GPUs on other
machines, and therefore, maximizing the utilization of GPUs in an open
cloud environment.
Subscribe to our calendar
<
https://calendar.google.com/calendar/u/2?cid=N3QyMm1ydm92amNmdTZqZm5ucDRu...
for event details.
To learn more about the Operate First Data Science Community, visit our
website
<
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
and find the agenda for the upcoming meetups here
<
https://www.operate-first.cloud/data-science/operate-first-data-science-c...;.
We look forward to seeing you at the Meetup!
Best,
The Operate First Data Science Team
Links:
1. Calendar Invite:
https://calendar.google.com/calendar/u/2?cid=N3QyMm1ydm92amNmdTZqZm5ucDRu...
2. Topic suggestion:
https://github.com/aicoe-aiops/cloud-first-data-science-community/issues/...
3. Website:
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
4. Agenda:
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
5. Slack:
https://join.slack.com/t/operatefirst/shared_invite/zt-o2gn4wn8-O39g7sthT...
6.
Mailing List:
https://lists.operate-first.cloud/admin/lists/community.lists.operate-fir...
7. Privacy Policy:
https://www.operate-first.cloud/data-science/operate-first-data-science-c...
--
Thanks and Regards,
Aakanksha Duggal
She/Her
Software Engineer
Red Hat Boston <
https://www.redhat.com/>
AI Center of Excellence
Office of the CTO
aduggal(a)redhat.com
<
https://www.redhat.com/>