iWARP RDMA Solution for Superfast NFS-RDMA Rendering

iWARP RDMA Solution for Superfast NFS-RDMA Rendering

posted by Todd Smith on Sept. 23, 2014, 9:35 a.m.

Question about TCP and RoCE compat.

So if I'm running RoCE to my renderfarm and TCP to my workstations both connected to NAS, how does congestion control occur? How does load balancing, QOS or any other traffic management occur?

Basically is there any way to stop this super fat pipe from dominating the storage infrastructure? From my basic understanding and initial research these things don't appear to play nice or at the very least they have independent congestion mechanisms.

Cheers,

Todd Smith

Head of Information Technology

soho vfx |

99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8

office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

Vendor addition

From a vendor perspective I think you are all correct It actual comes down to what the render workload is and what the network IO is as a ratio of the CPU workload.

Some additional pts.

1.       RDMA can function on both Infiniband and Ethernet (its called RoCE on Ethernet). In fact a new RoCE version has recently come out allowing it to operate on L3 Ethernet making it routable (ie. Routable RoCE)
2.       RDMA/RoCE will virtual remove all of your network overhead on your render CPUs (its a direct memory access) as it complete bypassed the kernel and TCP
3.       RDMA/RoCE is already baked into loads of Upper layer protocols (ULPs) NFS, SMB (called SMB-Direct), iSCSI (called iSER), GPFS, SRP (SCSI over RDMA) and others. Its also native in the main-stream OSs inc. both Linux and Windows there are a few switches you need but its relatively easy to use
4.       The entry point is 10GbE but obviously the benefits increase as you go to higher speeds as the delta to traditional TCP increases etc. It also interoperates so you can run standard TCP and RoCE over the same network at the same time which drives a single-network and reduces cost especially as a single network could potentially mean no FC etc.
5.       With Mellanox the network is agnostic and also the speed (its called VPI virtual protocol interconnect) so you can run RDMA/RoCE on both IB and Ethernet and up to 56Gb/56GbE. Yes I know 56GbE Ethernet is proprietary but its in the product set so you might as well use the extra 40% horsepower
6.       You can also use commodity hardware take a look at PixitMedia their solution is entirely built on commodity hardware components and can scale to 30 simultaneous streams of EXR 25fps 16-bit 4K (4x3) sure you wont need this storage horsepower for render but it does scale-out linearly and cost effectively. Having a single name-space for the lot makes things a lot easier for people.

In my opinion go with what you need. If you a building out a render-farm that doesnt have or need high network IO go with 10GbE and use RoCE this will give you CPU cycles back for no extra cost. If your render does have high network IO go with something more powerful Its all workload dependent. Horses for courses as they say!

Rich.

PS. Tin-hat and flame suit are on. :-)

From: studiosysadmins-discuss-bounces@studiosysadmins.com [mailto:studiosysadmins-discuss-bounces@studiosysadmins.com] On Behalf Of Nick Allevato
Sent: 22 September 2014 05:10
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] iWARP RDMA Solution for Superfast NFS-RDMA Rendering

This seems like a heated topic already; I like it.

I think I had a dirty dream about this once.

Networked CPUs and RAM.

It's like Open Compute 56.0

Seems epic (and newish)

-nick

On Sun, Sep 21, 2014 at 9:40 AM, Saker Klippsten <sakerk@gmail.com> wrote:
This is the next big thing guys.

Yes the goal is to speed things up and also scale them up while maintaining that speed up all on a cost factor that the project can handle so that you end up with a profit. Not a loss. Which most projects these days seem to be trending towards.

What is the renderer / OS that you are testing? Software plays the largest factor in overall render time .

It's one thing to transfer the dependencies around a cluster at X speed but there is still x amount of raw compute time to render the frame. If you are chopping a frame up into buckets and have 10 nodes render a single frame there is going to be a speed advantage for the lookdev stage. If you are going to render a sequence its going to take just about the same time.

If I can use a GPU render with say 8 of them in a chassis I don't have to worry about scaling as much and even then the cost to transfer dependencies is small until the raw compute time is below that of the network IO which it's very rarely is. Right now CPU/GPU takes longer than to transfer the scene file and textures for most projects. Any decent 10GB network should be fine.

$ IMOP you should be speeding up the render for GPU and CPU and looking at power costs for that giant cluster ;)

-s

My 2 cents

On Sep 21, 2014, at 8:57 AM, content@studiosysadmins.com wrote:
Yes mellanox vs chelsio and at 56G. To reach 56G one must utilize infiniband and a certain set of adapters. That is the core concept behind really really big render time savings. One could go over ethernet in windows 2012 or Linux Debian/Redhat, but you need to be very sure that the PCI-x cards support. This is the next big thing guys. I guess many of the ramp ups across this nation and overseas (hardware procurement) probably did not take anything like this into account. Too bad for those responsible sysadmins..... ;) Cause it looks like this stuff is speeding things up quite a bit.
You can go 10G if you buy the right PCI-xadapter cards. But its also a bios thing as I have been reading up. Luckily we have ramped up over 800,000 USD in newest HP hardware here. So I believe the Z820's and Z840's Bios supports this nextgen technology. There are only a handful of IB cards and they are new so don't go buy used equipment for these tests, and there are also only a handful of Ethernet 10G adapter cards ready and waiting.
In my dealings with renderfarms, the golden rule is to speed things up, right? Well if you can bump rendetimes down from say 90mins/fr to 9mins/fr, you are doing your job.
FYI - Also try and always build you own storage too ;) Thats the easy part (and cheaper part).

Jorg Mohnen, M.Sc. MBA

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

--
Nicolas Allevato, Ops, 5th Kind

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

Thread Tags:
discuss-at-studiosysadmins

0 Responses

0 Plus One's

0 Comments

Question about TCP and RoCE compat.

So if I'm running RoCE to my renderfarm and TCP to my workstations both connected to NAS, how does congestion control occur? How does load balancing, QOS or any other traffic management occur?

Cheers,

Todd Smith

Head of Information Technology

soho vfx |

99 Atlantic Ave. Suite 303, Toronto, Ontario M6K 3J8

office: (416) 516-7863 fax: (416) 516-9682 web: sohovfx.com

Vendor addition

From a vendor perspective I think you are all correct It actual comes down to what the render workload is and what the network IO is as a ratio of the CPU workload.

Some additional pts.

1.       RDMA can function on both Infiniband and Ethernet (its called RoCE on Ethernet). In fact a new RoCE version has recently come out allowing it to operate on L3 Ethernet making it routable (ie. Routable RoCE)
2.       RDMA/RoCE will virtual remove all of your network overhead on your render CPUs (its a direct memory access) as it complete bypassed the kernel and TCP
3.       RDMA/RoCE is already baked into loads of Upper layer protocols (ULPs) NFS, SMB (called SMB-Direct), iSCSI (called iSER), GPFS, SRP (SCSI over RDMA) and others. Its also native in the main-stream OSs inc. both Linux and Windows there are a few switches you need but its relatively easy to use
4.       The entry point is 10GbE but obviously the benefits increase as you go to higher speeds as the delta to traditional TCP increases etc. It also interoperates so you can run standard TCP and RoCE over the same network at the same time which drives a single-network and reduces cost especially as a single network could potentially mean no FC etc.
5.       With Mellanox the network is agnostic and also the speed (its called VPI virtual protocol interconnect) so you can run RDMA/RoCE on both IB and Ethernet and up to 56Gb/56GbE. Yes I know 56GbE Ethernet is proprietary but its in the product set so you might as well use the extra 40% horsepower
6.       You can also use commodity hardware take a look at PixitMedia their solution is entirely built on commodity hardware components and can scale to 30 simultaneous streams of EXR 25fps 16-bit 4K (4x3) sure you wont need this storage horsepower for render but it does scale-out linearly and cost effectively. Having a single name-space for the lot makes things a lot easier for people.

In my opinion go with what you need. If you a building out a render-farm that doesnt have or need high network IO go with 10GbE and use RoCE this will give you CPU cycles back for no extra cost. If your render does have high network IO go with something more powerful Its all workload dependent. Horses for courses as they say!

Rich.

PS. Tin-hat and flame suit are on. :-)

From: studiosysadmins-discuss-bounces@studiosysadmins.com [mailto:studiosysadmins-discuss-bounces@studiosysadmins.com] On Behalf Of Nick Allevato
Sent: 22 September 2014 05:10
To: studiosysadmins-discuss@studiosysadmins.com
Subject: Re: [SSA-Discuss] iWARP RDMA Solution for Superfast NFS-RDMA Rendering

This seems like a heated topic already; I like it.

I think I had a dirty dream about this once.

Networked CPUs and RAM.

It's like Open Compute 56.0

Seems epic (and newish)

-nick

On Sun, Sep 21, 2014 at 9:40 AM, Saker Klippsten <sakerk@gmail.com> wrote:
This is the next big thing guys.

Yes the goal is to speed things up and also scale them up while maintaining that speed up all on a cost factor that the project can handle so that you end up with a profit. Not a loss. Which most projects these days seem to be trending towards.

What is the renderer / OS that you are testing? Software plays the largest factor in overall render time .

It's one thing to transfer the dependencies around a cluster at X speed but there is still x amount of raw compute time to render the frame. If you are chopping a frame up into buckets and have 10 nodes render a single frame there is going to be a speed advantage for the lookdev stage. If you are going to render a sequence its going to take just about the same time.

If I can use a GPU render with say 8 of them in a chassis I don't have to worry about scaling as much and even then the cost to transfer dependencies is small until the raw compute time is below that of the network IO which it's very rarely is. Right now CPU/GPU takes longer than to transfer the scene file and textures for most projects. Any decent 10GB network should be fine.

$ IMOP you should be speeding up the render for GPU and CPU and looking at power costs for that giant cluster ;)

-s

My 2 cents

On Sep 21, 2014, at 8:57 AM, content@studiosysadmins.com wrote:
Yes mellanox vs chelsio and at 56G. To reach 56G one must utilize infiniband and a certain set of adapters. That is the core concept behind really really big render time savings. One could go over ethernet in windows 2012 or Linux Debian/Redhat, but you need to be very sure that the PCI-x cards support. This is the next big thing guys. I guess many of the ramp ups across this nation and overseas (hardware procurement) probably did not take anything like this into account. Too bad for those responsible sysadmins..... ;) Cause it looks like this stuff is speeding things up quite a bit.
You can go 10G if you buy the right PCI-xadapter cards. But its also a bios thing as I have been reading up. Luckily we have ramped up over 800,000 USD in newest HP hardware here. So I believe the Z820's and Z840's Bios supports this nextgen technology. There are only a handful of IB cards and they are new so don't go buy used equipment for these tests, and there are also only a handful of Ethernet 10G adapter cards ready and waiting.
In my dealings with renderfarms, the golden rule is to speed things up, right? Well if you can bump rendetimes down from say 90mins/fr to 9mins/fr, you are doing your job.
FYI - Also try and always build you own storage too ;) Thats the easy part (and cheaper part).

Jorg Mohnen, M.Sc. MBA

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

--
Nicolas Allevato, Ops, 5th Kind

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

iWARP RDMA Solution for Superfast NFS-RDMA Rendering

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112