Multi part

Form data

When you are sending an HTML form through a browser in an HTTP call, the data content can be sent in as request body in two formats.

  • application/x-www-form-urlencoded

  • multipart/form-data

For most of the cases, application/x-www-form-urlencoded can be used. application/x-www-form-urlencoded is NOT much efficient for

  • Sending Files or images

  • Sending large quantities of binary data or text which contains non-ASCII characters

For example, let’s say that the below data needs to be sent.

  • Name

  • Age

Then application/x-www-form-urlencoded can be used to send the above data. But let’s say that you also need to send the profile photo of the user in the request as well. So the data is now as below

  • Name

  • Age

  • Photo

In the above case, it will not be efficient to use application/x-www-form-urlencoded content-type. multipart/form-data should be used in this case. So for sending simple form data use application/x-www-form-urlencoded but if the form-data also needs to send binary data then use multipart/form-data.

Why is that?

application/x-www-form-urlencoded encodes each non-ASCII byte to 3 bytes. Basically applcation/x-www-form-urlencoded content-type request body is like a giant query string. Similar to the query string in a URI it is a key-value pair having the below format

key1=value1&key2=value21&key2=value22&key3=value3

While sending application/x-www-form-urlencoded, all the non-alphanumeric characters are URL encoded. All the non-alphanumeric character(reserved only) will be URL encoded in the below format

%WW

Where WW is the ASCII code of the alphanumeric character represented in hexadecimal format. As all the non-alphanumeric characters in the binary data are URL encoded where 1 byte is converted to 3 bytes. So size is increased three folds. So if you sending a file or image which is a lot of binary data then your payload will be very big i.e almost thrice the size of the actual payload. Hence it is inefficient for sending large binary files or large NON-ASCII data.

Now let’s understand the format of multipart/form-data.

As we mentioned before as well multipart/form-data has different parts separated by a delimiter or a boundary. Each part is described by its own header. The format of multipart/form-data is as below

--
Content-Disposition: form-data; name=""
Content-Type: 

[DATA]
--
Content-Disposition: form-data; name=""; filename=""
Content-Type: 
[DATA]
----
  • Each part is separated by a delimiter or boundary.

  • Each part contains its own headers to describe the type of data

  • Content-Disposition header for each part will be form-data. Contains the name field. This field contains the key name. If the part is a file and it will also a filename field

  • Each part will also contain its own data.

Let’s say we are sending the below data as part of multipart/form-data request

  • name = John

  • age =23

  • photo = Some binary data

And let’s say the delimiter or boundary is

xyz

Then the format will be as below

--xyz
Content-Disposition: form-data; name="name"
Content-Type: text/plain

John
--xyz
Content-Disposition: form-data; name="age"
Content-Type: text/plain

23
--xyz
Content-Disposition: form-data; name="photo"; filename="photo.jpeg"
Content-Type: image/jpeg

[JPEG DATA]
--xyz--

As multipart/form-data will send the binary data as it is, that is why it is used for sending files and large binary data. Now the question is. Why not use form-data all the time then?

The reason is that for small data the additional requirement of boundary string and headers will outweigh any optimisations. For example, let’s say we have the below data to be sent

  • name=John

  • age=23

Then while using application/x-www-form-urlencoded the data will be sent as

name=John&age=23

But while sending multipart/form-data the data will be sent as below which is almost 10 times the data that is sent in application/x-www-form-urlencoded

--xyz
Content-Disposition: form-data; name="name"
Content-Type: text/plain

John
--xyz
Content-Disposition: form-data; name="age"
Content-Type: text/plain

23
--xyz--

How to upload large file

How can we optimize performance when we upload large files to object storage service such as S3?

Before we answer this question, let's take a look at why we need to optimize this process. Some files might be larger than a few GBs. It is possible to upload such a large object file directly, but it could take a long time. If the network connection fails in the middle of the upload, we have to start over. A better solution is to slice a large object into smaller parts and upload them independently. After all the parts are uploaded, the object store re-assembles the object from the parts. This process is called multipart upload.

The diagram below illustrates how multipart upload works:

1.The client calls the object storage to initiate a multipart upload.

2. The data store returns an uploadID, which uniquely identifies the upload.

3. The client splits the large file into small objects and starts uploading. Let’s assume the size of the file is 1.6GB and the client splits it into 8 parts, so each part is 200 MB in size. The client uploads the first part to the data store together with the uploadID it received in step 2.

4. When a part is uploaded, the data store returns an ETag, which is essentially the md5 checksum of that part. It is used to verify multipart uploads.

5. After all parts are uploaded, the client sends a complete multipart upload request, which includes the uploadID, part numbers, and ETags.

6. The data store reassembles the object from its parts based on the part number. Since the object is really large, this process may take a few minutes. After reassembly is complete, it returns a success message to the client.

Last updated