Downloading linked images of a web
Andrew Henderson
Is it possible to download all .jpg and .png files linked in a web? I want to download the images from each post of each thread of [this forum][1] containing a link. For example [this post][2] contains a link to [this file][3].
I've tried with wget:
wget -r -np and it copied all the html files of that thread. Although I don't know why it jumps from ...thread?comment=336 to ...thread?comment=3232, when it was going one by one until comment 336.
2 Answers
Try with this command:
wget -P path/where/save/result -A jpg,png -r According to wget man page:
-A acclist --accept acclist Specify comma-separated lists of file name suffixes or patterns to accept or reject (@pxref{Types of Files} for more details). -P prefix Set directory prefix to prefix. The directory prefix is the direc‐ tory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory). -r --recursive Turn on recursive retrieving.Try this:
mkdir wgetDir wget -P wgetDir This command will get html page and put it in wgetDir. When I tried this command I found this file:
340782-official-digital-rendering-thread?page=145then, I tried this command:
wget -P wgetDir -A png,jpg,jpeg,gif -nd --force-html -r -i "wgetDir/340782-official-digital-rendering-thread?page=145"and it downloads images. So, it seems to work, although I do not know if these pictures are the ones you want to download.
#include <stdio.h>
#include <stdlib.h> // for using system calls
#include <unistd.h> // for sleep
int main ()
{ char body[] = "forum-post-body-content", notes[] = "p-comment-notes", img[] = "img src=", link[200], cmd[200]={0}, file[10]; int c, pos = 0, pos2 = 0, fin = 0, i, j, num = 0, found = 0; FILE *fp; for (i = 1; i <= 149; ++i) { sprintf(cmd,"wget -O page%d.txt '",i,i); system(cmd); sprintf(file, "page%d.txt", i); fp = fopen (file, "r"); while ((c = fgetc(fp)) != EOF) { if (body[pos] == c) { if (pos == 22) { pos = 0; while (fin == 0) { c = fgetc (fp); if (feof (fp)) break; if (notes[pos] == c) { if (pos == 14) { fin = 1; pos = -1; } ++pos; } else { if(pos > 0) pos = 0; } if (img[pos2] == c) { if (pos2 == 7) { pos2 = 0; while (found == 0) { c = fgetc (fp); // get char from file link[pos2] = c; if (pos2 > 0) { if(link[pos2-1] == 'g' && link[pos2] == '\"') { found = 1; } } ++pos2; } --pos2; found = 0; char link2[pos2]; for (j = 1; j < pos2; ++j) { link2[j - 1] = link[j]; } link2[j - 1] = '\0'; sprintf(cmd, "wget -O /home/arturo/Dropbox/Digital_Renders/%d \'%s\'", ++num, link2); system(cmd); pos2 = -1; } ++pos2; } else { if(pos2 > 0) pos2 = 0; } } fin = 0; } ++pos; } else pos = 0; } // closing file fclose (fp); if (remove (file)) fprintf(stderr, "Can't remove file\n"); }
}