Bravo List

Bravo List (http://www.bvlist.com/index.php)
-   TBDev (http://www.bvlist.com/forumdisplay.php?f=20)
-   -   Poor duplicate torrent detection in takeupload.php (http://www.bvlist.com/showthread.php?t=9614)

shadowfox 30th October 2013 06:07

Poor duplicate torrent detection in takeupload.php
 
With TBDev.2009(Final).rev.295 I've noticed a lot of duplicate torrents being posted, which seemed like something which would be coded against so I checked takeupload.php and it has a basic mysql check to try and prevent it.

Code:

    $ret = mysql_query("INSERT INTO torrents (search_text, filename, owner, visible, info_hash, name, size, numfiles, type, descr, ori_descr, category, save_as, added, last_action, nfo, client_created_by) VALUES (" .
        implode(",", array_map("sqlesc", array(searchfield("$shortfname $dname $torrent"), $fname, $CURUSER["id"], "no", $infohash, $torrent, $totallen, count($filelist), $type, $descr, $descr, 0 + $_POST["type"], $dname))) .
        ", " . time() . ", " . time() . ", $nfo, $tmaker)");
    if (!$ret) {
      if (mysql_errno() == 1062)
        stderr($lang['takeupload_failed'], $lang['takeupload_already']);
      stderr($lang['takeupload_failed'], "mysql puked: ".mysql_error());
    }
    $id = mysql_insert_id();

This has been basically unchanged since before TBDev, I found the same function in TBSource from years prior. Has anyone modified their query to better avoid duplicates? I'm a novice at mysql queries so I'm not totally sure what it's checking to determine if the torrent already exists but it's certainly not foolproof. I see many duplicates getting posted despite the same file being already uploaded to a torrent with the same name?

I couldn't find anything on google suggesting a better way to do it, it's been an issue across versions, surely someone's done it better by now? :sos:

DND 30th October 2013 10:48

well of course it uploads duplicate torrents if the name is changed even with a character.. let's say you have MyTorrent.h264, then someone uploads Mytorrent-h264/MyTorrent-H264.
if the name isn't the same as previous torrent uploaded it will be a duplicate
i have changed in takeupload the punctuation and spaces.
basically I inversed the code. where is space to be a punctuation so every torrent will get MyTorrent.h264.xxx.aac

joeroberts 30th October 2013 13:08

Just do a check info_hash.
they may change the name but that well stay the same.

shadowfox 5th November 2013 19:30

Quote:

Originally Posted by DeNeDe (Post 43189)
well of course it uploads duplicate torrents if the name is changed even with a character.. let's say you have MyTorrent.h264, then someone uploads Mytorrent-h264/MyTorrent-H264.
if the name isn't the same as previous torrent uploaded it will be a duplicate
i have changed in takeupload the punctuation and spaces.
basically I inversed the code. where is space to be a punctuation so every torrent will get MyTorrent.h264.xxx.aac

I don't think the torrent name would be much of an issue, that doesn't change the info_hash, so dupes would be caught. I've already altered the code to change spaces and underscores to dots, as that seems to be the best for RSS users to filter with.

Quote:

Originally Posted by joeroberts (Post 43193)
Just do a check info_hash.
they may change the name but that well stay the same.

That is mostly true, and i toyed with it a bit, but then I realized the real problem is which tool people use to create torrents with and what settings they use. The piece size of the chucks in the torrent alter the info_hash even if the file name and torrent name remain exactly the same. The system treats this as an entirely new torrent.

I'm wondering if there is a easy way to cross reference multiple values and form a better detection routine?

1. Don't allow duplicate hashes, this will reduce some dupes
2. Don't allow duplicate torrent names, maybe unless the new file size is larger?
3. Don't allow a file with the same size which loosely matches the same file name?

No one method seems effective on its own. I wonder if creating an unique index on the size column would be too drastic? There's already one on info_hash and the torrent name, any idea what the odds of the file size being exactly the same is?


All times are GMT +2. The time now is 15:13.

Powered by vBulletin® Version 3.8.11 Beta 3
Copyright ©2000 - 2024, vBulletin Solutions Inc.