You might find somebody selling an ActiveX control designed to handle generic M-JPEG over HTTP streams that raises an event every frame that presents the host program with the bytes of a JPEG frame, those bytes converted to a GDI DIB, or even an IPicture object you can treat as a StdPicture for display easily in VB. Or you might roll your own.
But as far as I can tell all of the easy to use HTTP mechanisms, even those offering interim results chunk by chunk for async requests, buffer the entire response. So when receiving an M-JPEG stream more and more memory gets consumed until finally the server closes the connection or your program aborts it. This seems to be the case for the WinInet control, MSXML.XMLHTTP, WinHttpRequest, and even calls to UrlMon's URLOpenStream. With 40KB average frames times 15 f/s that's 600KB more memory used every second you stream video!
This is an issue because MJPEG over HTTP works in a "long poll" or COMET-like fashion rather than "normal" HTTP request/response operation.
So this means rolling your own async/chunking HTTP Request object first, based on raw TCP. Not trivial because you need to handle authentication. Cameras I see range from no-auth, to simple URL query string auth, to Basic Authentication and even Digest Authentication.
These cameras also appear to be quirky. Some fail to correctly implement [tt]Content-Type: multipart/x-mixed-replace[/tt] headers (which MJPEG uses) so your code would need to handle that. Based on on articles and samples I found, a few used to even have spaces instead of hyphens in response header names (sending you [tt]Content Type:[/tt] instead of [tt]Content-Type:[/tt]).
So far so good, and it just means a lot of programming and testing. Then if you have to support SSL/TLS you will need a 3rd party Winsock control replacement.
Once you have that done you need to decode the streamed content body data as you receive it chunk by chunk. Each time you have a frame of JPEG parsed out you can present that and discard those bytes of the stream. Rinse, repeat... until you abort the connection or the server closes it on you.
The last issue I can think of is that if your network is too slow or your program can't process frames fast enough eventually the server is going to close the connection on you. So you either need to set your cameras for low frame rates and small image size or else have a fast network, very optimal code, and nothing that ever steals much CPU from your M-JPEG stream processing logic.
These MJPEG cameras do not seem to drop frames on TCP congestion push-back, they drop the connection. As far as I can tell from how this content type works they have no choice.
Another solution is to use something like H.264 instead of M-JPEG over HTTP. This is less well documented and a lot harder to handle in VB6, but it doesn't require as much network bandwidth. It probably also provides for frame dropping on congestion.
I could see using C++ to create such an ActiveX control but I'd have serious doubts about doing it in a .Net language. VB6 will be superior in terms of performance because it doesn't have the potentially fatal handicap of .Net Garbage Collection. And as far as I can tell all of the .Net Framework's HTTP client libraries have the same "leak memory until end of response" issue.