LOL - the progress bar's not showing anything yet |
It seems funny to say that one of my ambitions over the last six years has been to find a common technology that would enable large file uploads to the web. When I say large files I mean files that are 100Mb, 10GB or bigger. There are so may things that just make large file uploads to website a complete pain in the ass so let me just list a few.
- HTML INPUT TYPE=FILE just plain sucks
- Example when a browsers tries to POST a 10Mb file it loads the entire file into that single HTTP call - It's just too much data for one call to a web server (needs to be multiple small calls in parts, chunking)
- One most web servers - Large file uploads require web servers scripts timeouts to be changed
- If you get the file to a web server there is a chance that it will be deleted because some web servers put up file size restrictions that are not enforced till the entire file is completely there (more server tuning required)
- Have you ever tried to HTTP POST a 10GB file to the web - If it worked it would take forever and you get no progress indicators of any kind (all browsers behave the same way on this.)
The second category to the pain in the ass for large file uploads is available technologies. I've tried Java large file upload tools that worked with non-specific back end server, server specific ASP.Net ActiveX solutions that use ISAPI and I have even played around with Adobe Air (no threading support) and Flash a little bit and last would be Google Gears. Gears was the only one that just worked but it took a hard deep dive into some working code on YouTube Video uploader to understand it.
I believe that Google Gears uploader is still using the same underpinnings but you can check for yourself in the site at http://upload.youtube.com/my_videos_upload. Years back and peer an myself spent some time looking at their code and he made it work for an internal PHP sites used by only a few persons. Note that no one seemed to like the fact that they had to install Gears though. Interesting things to note about the YouTube uploader is that it's written with worker process support (a separate thread) and thus provided a responsive UI that did not block while uploading the file. The YouTube solution took file selected for upload and stored it in the Gears SQLite DB and then processed the upload in chunks. I think they might of even had a pause/resume but when it comes right down to it the solution just worked. More under the covers of the YouTube uploader was the use of a separate domain name for the file upload that I suspect was a special web server farm. I believe this nature of upload would bind the users session to a single server till the file upload was complete. If they have since updated the solution to a cloud computing formation then they would likely upload in parts (probably to server DB) to the cloud and then signal completion when all done and the have a back-end cloud worked process reassemble the file for subsequent conversion/processing. All in all it worked and it was great!
I've actually thought about this on and off over the years but aside from some attempts to do this with Adobe Air (no success there) I had not found a technology that I could make this work in till recently when I started programming in Silverlight. I'm sure it could be done as a Java Applet but I've no desire to learn Java (sorry.)
So let me just litter out what I think the necessary basics are here for the upload and after I present my code I will suggest some additions for later.
- File Dialog Selector that runs in all browsers, filters by extensions or other.
- Once a file is selected the ability to check the size before starting an upload
- The ability to break the file into manageable chunks that uploaded to any web server
- A cancel button to stop the upload
- Chunking capability for transfer in a format that can be decoded by any web server code (base64)
- Progress indicators so you know that work is going on.
- Responsive UI that does not bog web client down.
Thats about all I can think of at the moment but I can edit this later.
I picked Silverlight because it works a lot like Gears but it's gaining adoption a monster fast rate since companies like Hulu are using it for their Video Players and you will find one version or another on about 60% of all Windows boxes (I looked that up that stat but don't have the link to provide here to back it up, sorry.) Silverlight runs in most modern browsers and on OSX (you can see the System Requirements here http://www.microsoft.com/getsilverlight/Get-Started/Install/Default.aspx.)
Silverlight has limitations but supports a good strong threading model and making browser based web requests. Note - Silverlight in-browser web requests uses the exact same underlying browser mechanism for web requests and hence carry with them some of the same limitations (like same domain security restrictions and two simultaneous connection limits to a domain (IE8 supports eight I think.) Silverlight can work around the cross domain issue with server landed cross-domain policy file which is cool but you still take up one of your two browser connections so it's something you should be aware of.
I'm going to give this in the following pieces
If you want to do this on your own you can just create a new Silverlight 4.0 project in Visual Studio or Visual Web Developer Express (I used version 2010 for this project.) Then add an ASHX file to the web part of the project (name of the file is embedded in the code below.)
Here is the XAML
And now the code behind the XAML
using System; using System.Collections.Generic; using System.Linq; using System.Net; using System.Windows; using System.Windows.Controls; using System.Windows.Documents; using System.Windows.Input; using System.Windows.Media; using System.Windows.Media.Animation; using System.Windows.Shapes; using System.Threading; using System.ComponentModel; namespace LargeFileUpload { public partial class MainPage : UserControl { public static System.IO.Stream fs; private BackgroundWorker bw = new BackgroundWorker(); public MainPage() { InitializeComponent(); bw.WorkerReportsProgress = true; bw.WorkerSupportsCancellation = true; bw.RunWorkerCompleted += new RunWorkerCompletedEventHandler(bw_RunWorkerCompleted); bw.ProgressChanged += (s, e) => { long bytesReadTotal = (long)e.UserState; textBox1.Text = String.Format("Sent {0} out of {1}" , fileSizeString(bytesReadTotal), fileSizeString(fs.Length)); progressBar1.Value = e.ProgressPercentage; }; bw.DoWork += (s, e) => {uploadFileToWeb((string)e.Argument, (BackgroundWorker)s, e); }; } string fileSizeString(long bytes) { string f; if (bytes > 1073741824) f = String.Format("{0:0.00} Gb", (float)bytes / (float)1073741824); else if (bytes > 1048576) f = String.Format("{0:0.00} Mb", (float)bytes / (float)1048576); else if (bytes > 1024) f = String.Format("{0:0.00} Kb", (float)bytes / (float)1024); else f = String.Format("{0:0.00} bytes", bytes); return f; } void bw_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e) { if ((e.Cancelled == true)) { progressBar1.Value = 0; textBox1.Text = "Upload Canceled!"; } else if (!(e.Error == null)) { textBox1.Text = ("Error: " + e.Error.Message); } else { progressBar1.Value = 100; cmdCancelUpload.IsEnabled = false; textBox1.Text = "File upload completed!"; } cmdStartUpload.IsEnabled = true; } private void cmdCancelUpload_Click(object sender, RoutedEventArgs e) { bw.CancelAsync(); cmdCancelUpload.IsEnabled = false; } private void cmdStartUpload_Click(object sender, RoutedEventArgs e) { OpenFileDialog of = new OpenFileDialog(); bool? userClickedOK = of.ShowDialog(); if (userClickedOK == true) { progressBar1.Value = 0; progressBar1.Maximum = 100; textBox1.Text = of.File.Name; fs = of.File.OpenRead(); string fileName = of.File.Name; cmdCancelUpload.IsEnabled = true; cmdStartUpload.IsEnabled = false; bw.RunWorkerAsync(fileName); // calls DoWork() } } void uploadFileToWeb(string fileName, BackgroundWorker worker, DoWorkEventArgs e) { byte[] b = new byte[57344]; int bytesRead = fs.Read(b, 0, b.Length); long bytesReadTotal = bytesRead; while (bytesRead > 0) { // has request to stop early been made? if (worker.CancellationPending) { e.Cancel = true; break; } AutoResetEvent a = new AutoResetEvent(false); WebClient wc = new WebClient(); wc.Headers["Content-Type"] = "application/x-www-form-urlencoded"; string errormessage = ""; wc.UploadStringCompleted += (s, e1) => { string result = ""; if (e1.Error == null) result = e1.Result; else { errormessage = e1.Error.Message.ToString(); e.Cancel = true; // set worker.CancellationPending } a.Set(); }; wc.UploadStringAsync(new Uri("/StoreFile.ashx", UriKind.Relative), "POST" , "filename=+" + fileName + "&filestream=" + Convert.ToBase64String(b, 0, bytesRead)); a.WaitOne(); int percentComplete = (int)((float)bytesReadTotal / (float)fs.Length * 100); worker.ReportProgress(percentComplete,(Object)bytesReadTotal); if (worker.CancellationPending) { e.Cancel = true; break; // request is stopping } bytesRead = fs.Read(b, 0, b.Length); bytesReadTotal += bytesRead; } } } }
And last but not least the ASHX code behind
using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.IO; using System.Text.RegularExpressions; namespace LargeFileUpload.Web { ////// Summary description for StoreFile /// public class StoreFile : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World"); string check = context.Request["filestream"]; check = Regex.Replace(check, " ", "+"); byte[] bytes = Convert.FromBase64String(check); string fileName = context.Server.MapPath(context.Request["filename"]); FileStream fs; if ( !File.Exists(fileName)) fs = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.Write); else fs = new FileStream(fileName, FileMode.Append); fs.Write(bytes, 0, bytes.Length); fs.Close(); fs.Dispose(); //System.Threading.Thread.Sleep(500); } private void fileit() { } public bool IsReusable { get { return false; } } } }
The first iteration of this code blew because I tried wrote a quick sample to read a file into isolated storage as I reasoned that it would be a good place to put the file in case I wanted to add pause/resume features down the line. What I learned is that moving a 6GB file from disk to isolated storage is just as slow as moving the file from one drive to another one and I did not want to slow this solution down so I scrapped that approach.
The second try had me trying to read the file on the UI thread alone and simulating a file upload but I never thought would work anyway as all busy work on the same thread make it non responsive and with a very large file you get a bricked app.
The next try I wanted to use a standard Thread to read the file. Since Silverlight uses a pretty strong sandbox and my Threading skill are newbie at best it for the life of me I could not fathom how to get an open FileStream handle over to a new Thread. After a ton of playing around I found a way around this using a public static variable.
The next bit of fun came in trying to decide how to send the large chunks of data to the web server. HTTP GET has a typical query string size limit in most browsers of 2048 characters so I wanted to use an HTTP POST. This led to HttpWebRequest with led to a crappy bunch of hard to read code. Silverlight HTTP communications are asynchronous which means you have to wrap your head around a different way of doing things. WebRequest.Create, BeginGetRequestStream, AsyncCallback, GetRequestStreamCallback, blah, blah, blah, blah blah. How can Microsoft take something that was once simple and make into a monster is beyond me. Some digging later on Google (hours later) I found a simple WebClient, UploadStringAsync and UploadStringCompleted which was much more simple and worked.
With the final pieces in place the above is basically what I came up with and though it slow it seems pretty reliable. Start the 6GB file upload, go away for 6 hours and then come back to a just about to finish up upload. Some of this slowness might be attributed to the fact that I'm doing all this in debug mode but I'm sure a lot of it has to do with the base 64 encoding overhead.
Cons for this solution ---
A major cons of using Base64 encoded strings for chunked transfer is bloat. Turning byte data into a string representation makes the number of bytes transferred grow quite a bit. I ran this one a 6GB file and it took five hours to transfer it to my own machine but that a debug build. Some enhancements that I would make would be
- To change over to using a random access file and on creation of the file for the first time I would create the file to the actual size and just write the bytes into the file in their respective locations. In Silverlight there is no FilStream Open with FileShare.ReadWrite capability but on the back end ASP.Net code I can do that then I would be able to have multiple writers lock, write, then unlock sections of the file.
- In Silverlight I would also carry some structure of the blocks not yet written in case of server side error and the need to resend a block multiple times.
- Pause and resume would be nice with the only caveat that the Silverlight control will need to remain open in the browser to keep the file handle open. If you close and come back even if I stored the block list in IsolatedStorage Silverlight's security sandbox would require you to click a button action to get the file handle back open.
- I would add a lot of error handling code to this solution but I wanted to keep it as simple as possible to convey the basics as I found them out.
Drop me comments if you see something or a cleaner use of Thread control for holding up the WebClient's asynchronous call using proper Thread locking code or whatever. I'm new to using Threading so I will likely have done something not quite right about that area of the code.
Cheers and hope you like this.
4 comments:
hi ur code is super i need the same concept for downloading large files what u have done for upload.for download i need using savefiledialog.
I haven't gotten to it yet but I did post the working code on CodePlex at http://lfuis.codeplex.com/ and you can get it there.
Refactoring for download should not be too hard but finding the time will take some doing.
Great post, it saved me hours! I just wanted to point out that
a) this code works fine in SL 3.0 too and
b) if you're uploading text files (like me) you can use Encoding.UTF8.GetString(b, 0, bytesRead) to send the chunk with wc.UploadStringAsync(address,data) instead of encoding it base64 which will probably save bandwidth as well as use the goodness of POST instead of querystring.
c) if you're uploading binary files then I think UTF8 encoding is likely to throw on some invalid combination of bytes but maybe we can write a custom Enconding to get around this.
Cheers!
Panagiotis - Glad you liked it and could make use of it. I plan to update the download code sometime soon (next couple weeks or so.)
Post a Comment