June 7, 2013
-
Archiving Your Xanga
I was going to download an archive to see how it compared to the last time I archived this site in 2009 (unless I'm misremembering, they now have a way to archive your photos, pulses, etc.) but the page it directs me to doesn't exist. You continue to fill me with confidence, Xanga Team.
In any case, the largest reason I wanted to see how it looked this time was that, last time, it was a series of htm files. Any form of layout that you had or pulses or images (I *imagine* that images hosted on other sites would appear, with an internet connection) aren't included in this. Also, having been labeling my posts with tags for the past year (I think I made it to 2009 but I haven't made it all the way through yet), there's no way to tap into that very valuable organization method (particular given that the search function will most likely be disabled).
So I started looking into alternative ways to archive my site.
Yet again reaffirming my love for the terminal (as well as wget), I figured out a way to do it (using wget, obviously).
If you have Linux, a Mac, BSD, or Solaris, simply open up a terminal and put in wget with the flags m, k, K, E, and p with the URI of your Xanga (so, in my case, I put in wget -m -k -K -E -p thirst2.xanga.com). Then press enter.
It will create a new folder titled after your site and download everything there. The m flag "turns on recursion and time-stamping, sets infinite recursion depth[,] and keeps FTP directory listings". Basically, it makes certain wget downloads more than just the first page.
The k flag "convert[s] the links in the document to make them suitable for local viewing". Basically, it makes certain that, if you click on a link, it'll direct you to file:///home/[your home folder's name]/[your xanga's name].xanga.com/etc instead of to http://[your xanga's name].xanga.com/etc.
The K flag will back up the original files, unaltered, with the file extension .orig (that'll basically be useless if Xanga is taken down but I figure it doesn't hurt to have the originals just in case; that or it's my packrattiness talking).
The E flag will, "[i]f a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the [file extension .html], […] cause the suffix .html to be appended to the local filename".
Lastly, the p flag will get images and other embedded objects (I still haven't seen how well that works yet; thus far, wget hasn't downloaded any images and I very much want the images I uploaded to this site. Towards the end here, I started uploading them to tinypic.com before using them in a post but I certainly didn't in the beginning).
Edit: It's not downloading the images because they're often on URIs of http://x01.xanga.com/etc. Since it's not thirst2.xanga.com, wget thinks it's not supposed to get them. I haven't tested it yet but adding the H flag should fix that. It enables "spanning across hosts when doing recursive retrieving". Of course, this also means that any pictures I uploaded to tinypic.com and linked to will also be downloaded (even though tinypic isn't going down anytime soon…), and maybe YouTube stuff as well… Though it does mean that any videos I uploaded will be downloaded, so that's not bad. I'll let you know if it works once I get to try it.
Edit: Alright, the H flag, even with the D flag marked with domains that it's supposed to restrict itself to, likes to downloaded anything linked to your site and tries to download all of Xanga. Regardless, this shouldn't even be a problem since the p flag is supposed to download all images, etc. including those that might be hosted off the site. Thus, x01.xanga.com/whatever/whatever.jpg should be downloaded, seeing as it's pretty directly embedded. Sad; it's such a useful and powerful tool otherwise. So, the Debian software repositories had WebHTTrack and I downloaded that. It's in the process of downloading my site but it seems to be doing a good job. It's a bit overkill in that it's downloading everything linked to tinypic and also a PDF version of the New York Times article I linked to but it is not, notably, downloading other people's Xangas. It's just being thorough, which I suppose is nice in the event that tinypic goes down or something. I should have an archive that works entirely locally and thoroughly without an internet connection (which is more than I was expecting before). Go to http://www.httrack.com/ in order to get the program. They have a version for Windows, Mac, Linux, and BSD. The Mac one looks a little bit more complicated but the Linux and Windows versions are nicely straightforward.
Beyond the possible image hiccup, wget didn't seem to download the css stylesheets the first time I tried it (though, admittedly, it was without the p flag since I just wanted to run a test). If it doesn't, it's very simple to do. Just open the first index.html file you get (the one that corresponds to your home page) in any text editor, search for "css" and download the links that it finds (mine had two). With those inside the first folder, everything should render nicely.
Even if the images don't get downloaded, this means a site of local files that should look pretty much exactly as you had it (since background images had to be hosted on outside sites, I believe) and can be navigated just as you used to navigate your site, including pulses and (with any luck) images.
The one drawback is that you'll probably have to set all private and protected posts to public given that you have to be signed in to see those (and you obviously can't be signed in if the Xanga site is down or moved to Xanga 2.0). I'll let you know when this first archive finishes.
If you have a Windows, the above should work fine if you have CygWin installed (though, if you have CygWin installed, you would have known you could do the above the second I said "terminal" and not had to read until here). If you none of what I've mentioned has applied to you yet, then I'm less certain what will work (given I won't have tried it).
Apparently WebZip and HTTrack are programs that could get the job done but I've never used them.
Also, I probably needn't tell anyone but archiving your site will take some times, particularly if you've been here a while. This process will probably take longer than the usual archive way that Xanga provides because it's not just downloading your individual entries but also every page that might be displayed if I clicked the "Older" and "Newer" button at the bottom your main page as well as all the images on the site, etc.
Also, I'm going to tag some of you wonderful people who I used to follow on here but have since left Xanga. I know some have used this site as a journal of less pleasant times (and perhaps you wanted to forget that) or stopped blogging as much as five years ago and find the old information useless but I'm doing it in the case you wanted to save this stuff and probably won't be made aware that Xanga is closing, otherwise.
@NatalieTheSaint, @escapist767/@Alyxandri, @LiquidityOfSelf, @FlyAway180, @Opaque_Life, @kassandrag, @opticalnoise, @peloha, @leviculus, @The_Ferocious_Lam, @stephanieoakley01, @iknowyou12345, @xNicolax, @mayjun, @back_2_basic_love, @cermetk, @LaRuralMetroFemme, @SkygreenII, @erinjessicaxox, @avariellefaye, @stephysturt824, @desertraindrop86
Comments (5)
i had problems archiving an old account but it eventually workedkeep tryinggood luck
I saved 10 entries yesterday by copying them into word. It was a really annoying task. didn't even kow there were programs to do that, lol. I would love if my pictures could be saved too.
@under_the_carpet – Heh, yeah, I have far too many posts to possibly consider doing it by hand. I know a lot of companies that want to gather stats and the like will scrape sites so I figured there would have to be a way.I could probably get wget to get pictures as well (and it'd probably be able to nab the css stylesheet since it's hosted on the domain css.xanga.com) but I haven't been able to get it yet. Seems the H flag just downloads any other site you link to, so you'll eventually end up spidering along the whole of Xanga. Not quite what I want…
Thank you. I am still a little stunned about the closure but for some reason I did not understand any of that lingo. Oh well, guess I have to go an bug geek squad as well as find a dumb girl alternative.
@Jade_Orchid – Haha, don't bother with wget; it didn't work as well as I expected. Go to http://www.httrack.com/ and download the program. I found it worked like a charm and the walkthrough was pretty straightforward.
Comments are closed.