CSV support

Post Reply
Rodent
Posts: 90
Joined: Mon Dec 15, 2003 11:23 pm

CSV support

Post by Rodent »

On thing that I think really would make UE a hit is CSV support. CSV files are being used more and more in picture groups, but can be used for any type of file, like MP3s for example. CSV files include CRC support, so you can see both if a file is part of a particular set among several identical, as well as if it is OK, or is corrupted or had been vhanged or tampered with.

I imagine something like UE posting the CSV data (filesize, CRC value (filename is already there)) in the header, then other people with UE can import CSV files into UE with data for files they need, then they can scan one or several groups for these files. If UE finds any of them it downloads them automatically, or create a list you can choose from, or whatever.

I think it would be fairly simple (at least in principle) to implement actually.

Some kind of interaction with a CSV database program like PicCheck (The!Checker - http://mitglied.lycos.de/vhorch/piccheck.htm ) might also be an option.
alex
Posts: 4515
Joined: Thu Feb 27, 2003 5:57 pm

Post by alex »

ue can post sfv is it about the same?

md5 sum is more reliable in the strict sense but crc is quite ok as well.

to find files you need message-id, md5 hash value is not known (i dind't read about the files but i think they are generic, not usenet specific).
Rodent
Posts: 90
Joined: Mon Dec 15, 2003 11:23 pm

Post by Rodent »

Actually CRC check is not the important thing here. The point is that these CSV files contain data, including filenames, for sometimes hundreds or thousands of files that people need. If you could load these CSV files into UE and then make it automatically do a header search on filenames in selected groups, and list all hits, you could easily find the files that match very quickly. A manual search on each file would take an awful long time and require a lot of work, but this way it could be done in seconds or minutes. Currently I don't think there is any newsreader capable of doing this.

Filesize (number of bytes) and CRC value is also included in these CSV files, and could be used to refine the search, but basically a search on filename would be enough. PicCheck will do the byte and CRC check anyway, after download. And if you stick to search on filename only, it would be very simple to implement.

Here's a couple of samples from CSV files, so you can see the data structure. The first element is filename, then filesize, then CRC value, then folder structure (used by PicCheck to sort files into folders according to the original folder structure of the file collection):


126883_1576.jpg,1583743,D226CCE0,\signs\,
128542_1382.jpg,3261051,E8F62DDC,\signs\,
128891_8047.jpg,2657495,AB996687,\signs\,
91120_3013.jpg,1656923,E9E58A2A,\architecture\,
93494_2489.jpg,241478,C709A7AD,\architecture\,
93734_4651.jpg,2005851,C576B94B,\architecture\,
93735_1037.jpg,1662032,C13AEA68,\architecture\,
96950_4994.jpg,295419,E56F6BAE,\architecture\,
102556_4922.jpg,1678845,0BCB8769,\bikes and motorcycles\,
104083_9686.jpg,308089,9EE9220B,\bikes and motorcycles\,
105972_9954.jpg,795785,C4B48513,\bikes and motorcycles\,
110354_4705.jpg,599614,3BB92C92,\bikes and motorcycles\,

01_Besame Mucho.mp3,4894590,F2874058,\The Beatles - Unsurpassed Masters Vol 1\,
02_How Do You Do It.mp3,3888148,A54FDD01,\The Beatles - Unsurpassed Masters Vol 1\,
03_There's A Place.mp3,4511742,E469B08D,\The Beatles - Unsurpassed Masters Vol 1\,
04_I Saw Her Standing There.mp3,9578252,553F0069,\The Beatles - Unsurpassed Masters Vol 1\,
05_Do You Want To Know A Secret.mp3,4052836,1A146B68,\The Beatles - Unsurpassed Masters Vol 1\,
06_A Taste Of Honey.mp3,4315302,9D19077C,\The Beatles - Unsurpassed Masters Vol 1\,
07_There's A Place.mp3,4412268,512B259E,\The Beatles - Unsurpassed Masters Vol 1\,
08_I Saw Her Standing There.mp3,7080525,159193A4,\The Beatles - Unsurpassed Masters Vol 1\,
09_Misery.mp3,9553993,244BCD7E,\The Beatles - Unsurpassed Masters Vol 1\,
10_From Me To You.mp3,6777081,1613F57F,\The Beatles - Unsurpassed Masters Vol 1\,
11_From Me To You.mp3,7904735,C0B22C17,\The Beatles - Unsurpassed Masters Vol 1\,
12_Thank You Girl.mp3,4464095,D90C285F,\The Beatles - Unsurpassed Masters Vol 1\,
jaapf
Posts: 203
Joined: Thu Sep 11, 2003 3:06 pm
Contact:

Post by jaapf »

Rodent wrote: <cut>A hundreds or thousands files
<cut>search on each file
Won't that generate hundreds or thousands of searches on the search-server?
I tend to think that it will generate massive loads wich will ultimately lead to the closing of the free service.

You could try one of the commercial NZB-sites to support it?
At least they can make a buc on the generate datastream ;)
Dutch? Visit the Dutch UE/Newspro forum at: http://www.binaries4all.nl
Nederlandse UE handleiding op http://www.binaries4all.nl/ue
English UE tutorial online at http://www.binaries4all.com/ue/
Handy links at: http://jpfx.zapto.org/
dengle
Posts: 274
Joined: Mon Jun 30, 2003 2:37 pm

Post by dengle »

What generates these CSVs? is it PicChecker? Where does all the info come from inside it? I understand that the crc, filesize and filename come from the actual file, but what is the directory info used for by someone that would be downloading it?

how would a CSV file help if the poster rared or zipped up the pictures? or don't they usually do that in the picture groups you browse?
Josef K
Posts: 534
Joined: Thu Feb 27, 2003 7:29 pm

Post by Josef K »

I'll add my thoughts on this. I've requested the inclusion of CSV support before but Alex pretty much didn't seem like he thought it was a high priority. The request I made was more along the lines of using CVS's as filechecking, ala BNR2/3.

The way it works in BNR is to read the CSV and highlight matching files in the header view once group(s) are loaded. BNR is configurable to display files with different colours and font bolding/italicising so you can easily see at a glance if you have already downloaded a file. BNR goes further to run a filecheck on user-selectable disks or directories so you can see if the file(s) exist on disk. This has the added benefit whereby you can launch the file on disk right from BNR and, if the image set has an index, you can see if it matches the one in the index. This can help if two files have the same name but different contents.

Additionally, BNR has a configurable spin control where you can specify a percentage difference. This, for example, can match a file existing in a CSV to one found in the header view but with a 5% (default) filesize difference. This accounts for cases when files are shown as having slightly different sizes to the ones you have already downloaded but are the same. This can occur when posting software isn't exactly up to par with the best of them.

Another feature of BNR I'd find useful in UE is the use of CSV's to mark Want filters. If you select a directory of CSV's or select single CSV's in BNR's Want dialogue, you can set it to automatically download all matching files found in the CSV or to highlight the matching files in the header view so you can manually pick them out first. I think this is close to what Rodent would like to see in UE.

All of this, of course, relies on UE, like BNR, being able to guess the filename from the subject line. Where filenames are enclosed in quotes (as they should be) it's fairly easy for software to pick the filename out from the rest. BNR has a separate 'Filename' column, along with the usual 'Subject', 'Author', 'Date', etc.. As much as I find UE fantastic for all binary groups and text, I still to this day use BNR for image groups for the reasons I mention above, I'm just glad I don't have an awful lot of image groups I check regularly.
Rodent
Posts: 90
Joined: Mon Dec 15, 2003 11:23 pm

Post by Rodent »

jaapf wrote:
Rodent wrote: <cut>A hundreds or thousands files
<cut>search on each file
Won't that generate hundreds or thousands of searches on the search-server?
I tend to think that it will generate massive loads wich will ultimately lead to the closing of the free service.
You could limit it to only searching internally in UE. That's good enough I think.
Rodent
Posts: 90
Joined: Mon Dec 15, 2003 11:23 pm

Post by Rodent »

dengle wrote:What generates these CSVs? is it PicChecker? Where does all the info come from inside it? I understand that the crc, filesize and filename come from the actual file, but what is the directory info used for by someone that would be downloading it?
It's basically used for collections of files. Someone creates a collection and shares it with others, and includes the collection CSV. They then use the collection CSV to recreate the folder structure, and check if all files are there and are OK.

But there are other features as well - e.g. you can generate a report CSV from your collection which tells which files are missing or corrupted. Then you post this report CSV in the group, asking for fills. The "filler" then loads the CSV into PicCheck, and ask the program to pick the requested fills from his own collection. They will then automatically be placed in his upload folder, ready to post. The requester then downloads the fills into his download folder, from which PicCheck automatically sorts them into the collection, according to the collection CSV.

The requested search feature is basically for searching through groups for other fill posts which may contain some of those files you need yourself. If for example someome post a fill request of say 1000 files for a particular collection, and you only need say 100 from that collection, you can quickly scan the post using your report CSV to see if there are any of those you need among them. If you're lucky you can find all the files you need for your collection that way, from other posts. The alternative is either to find and download the files manually (lot of work, and perhaps there's none of those you need anyway), or download the whole thing and then ask PicCheck to pick those you need, and dump the rest. So often it's easier just to post your own request, adding to the redundancy of files.

So generally this CSV search feature could save a lot of time and bandwidth, as well as server space.

dengle wrote: how would a CSV file help if the poster rared or zipped up the pictures? or don't they usually do that in the picture groups you browse?
In those picture groups I visit, it's rare people zip or rar pics. But PicCheck can actually handle zip and RAR files as well, and pull the files from them automatically.
Post Reply