Added 3 more datasets.

- past-8-years

- 2004content-rarbg

- ipfs-QmbpRxBZ5HDZDVRoeAU8xFYnoP4r5eGCxdkmfFW3JbA6mq
This commit is contained in:
Sleaze 2023-06-05 09:34:27 -07:00 committed by Mr. Sleaze
parent d56306ae12
commit 4ce4913ded
31 changed files with 267620 additions and 139 deletions

View File

@ -0,0 +1,6 @@
# 2004content-rarbg
Source: https://github.com/2004content/rarbg
See also: [Orignal repository README.txt](README_orig.txt)

View File

@ -0,0 +1,22 @@
rarbg
Backup of magnets from RARBG
Currently:
clean.py is my Python script for cleaning up magnets post-extraction. I think it might have some finnicky thing going on with the way it fixes two magnets in one line, but it works.
moviesrarbg.txt holds my original post, cleaned up a lot. (117,392)
showsother.txt holds my original post, cleaned up a little. (137,671)
showsrarbg.txt holds my original post, cleaned up a lot. (11,699)
everything.7z holds everything that i've compiled (3,468,029)
--- everything.7z is complete ---
(and it would take a large new backup find to convince me otherwise)
(and convince me to update it)
Some of the stuff in everything.7z did not come from RARBG, and that is my next step. I'm accepting suggestions for how to filter those. My current best idea is to write something to look for commonly formatted titles, like TITLE.YEAR.RESOLUTION.SOURCE.ENCODING-GROUP, but I'll need input on what porn/music/games titles usually looked like on RARBG, I'm not familiar with them.
I'll filter everything.7z and split it into its relevant categories. For example, the one I'm most excited for is a .txt file dedicated to solely 1080p BluRay x265 -RARBG movies.
Thanks guys.
This repository was mentioned on TorrentFreak - https://torrentfreak.com/rarbg-over-267000-movie-tv-show-magnet-links-appear-online-230601/
Much more info about this project can be found on Reddit - https://www.reddit.com/r/Piracy/comments/13wn554/my_rarbg_magnet_backup_268k/

View File

@ -0,0 +1,70 @@
def fix(line, data):
try:
hash = line[20:[pos for pos, char in enumerate(line) if char == '&'][0]].lower()#hash is end of prefix to first '&', lowercased
except:#if no '&dn='
hash = line[20:]
line = line + '&dn='
try:
int(hash, 16)#check if hash is hexadecimal
except:
return
if line.count('&') > 1:#look for trackers
location = 0
tocheck = []
while location < len(line):#find all occurences of '&'
location = line.find('&', location)
if location == -1:
break
tocheck.append(location)
location += 1
for index in tocheck:#iterate through occurences of '&'
try:
if (line[index + 1] == 't') and (line[index + 2] == 'r') and (line[index + 3] == '='):#if occurence is part of a tracker then ignore
pass
else:#if not, it's part of the title so replace it
line = line[:index] + line[index + 1:]
except IndexError:
line = line[:index] + line[index + 1:]
if line.count('&') > 1:#if it actually has only trackers now
title = line[[pos for pos, char in enumerate(line) if char == '='][1] + 1:[pos for pos, char in enumerate(line) if char == '&'][1]]#title is second '=' to second '&'
else:
title = line[[pos for pos, char in enumerate(line) if char == '='][1] + 1:]#title is second '=' to end if no trackers
else:
title = line[[pos for pos, char in enumerate(line) if char == '='][1] + 1:]#title is second '=' to end if no trackers
title = ''.join(char for char in title if ord(char) < 128)#strip non-ascii characters
linesplit = ['magnet:?xt=urn:btih:', hash, '&dn=', title]
data.append(linesplit)
return data
data = []#lists within list
with open('everything.txt', encoding='utf-8') as file:#open file
for line in file:
line = line.strip()
if line.startswith('magnet:?xt=urn:btih:'):#check for validity
if 'magnet:?xt=urn:btih:' in line[20:]:#check for paste errors on my part
secondline = line[line.find('magnet:?xt=urn:btih:', 20):]#the second magnet link in this line
line = line[:line.find('magnet:?xt=urn:btih:', 20)]#the first magnet link in this line
data = fix(secondline, data)#go ahead and add the second to data
if 'magnetxturnbtih' in line[20:]:#paste errors that got symbols removed (and 'd' after the first '&', for some reason)
hash = line[line.find('magnetxturnbtih', 20) + 15:line.find('n', line.find('magnetxturnbtih', 20) + 15)]#pull just the hash of the second magnet, which stretches from the end of the magnet prefix to the first occurrence of 'n' past the prefix
title = line[line.find('n', line.find('magnetxturnbtih', 20) + 15) + 1:]#title stretches from that 'n' to the end (any trackers will be stripped out later)
secondline = 'magnet:?xt=urn:btih:' + hash + '&dn=' + title#put it back together
line = line[:line.find('magnetxturnbtih', 20)]
data = fix(secondline, data)
data = fix(line, data)#add split line to data
for magnet in data:
for character in ['`', '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '[', '{', ']', '}', '\\', '|', ';', ':', '\'', '\"', ',', '<', '>', '?', '/']:
magnet[3] = magnet[3].replace(character, '')#get rid of symbols except '.' and '-'
magnet[3] = magnet[3].replace(' ', '.')#replace spaces
dic = {}#dictionary to eliminate duplicate hashes
for i in sorted(data, key=lambda x: x[3]):#sorted data because it lets me replace null titles because the last duplicate keeps the title and nulls are listed first in sort
dic[i[0] + i[1]] = i[2] + i[3]
results = []
for value in sorted(dic, key=dic.get):#sort dictionary
results.append('{}{}'.format(value, dic[value]))
with open('output.txt', 'a', encoding='utf-8') as output:
for i in results:
output.write(i + '\n')

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,2 @@
btc bc1q0zg588lsn69anj30rewnlmgup2rzq776ll7m6y
literally anything at all would be so appreciated

145
README.md
View File

@ -2,145 +2,12 @@
> "I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened."
rarbg.to closed this week on Wenesday, May 31, 2023 (2023-05-31).
rarbg.to closed permanently on Wenesday, May 31, 2023 (2023-05-31).
I've been collecting databases containing magnet links and torrent metadata since 2019 or so.
This is a collection of all publicly released Rarbg.to databases.
I had to stop collecting detailed torrent info for all sections except XXX due to my ISP threatening me, even for only collecting metadata (the complaint filing systems of the *AA organizations don't differentiate between real downloaders and metadata-only downloaders).
Anyhow, here are SQLite and Postgres dumps of everything I have relating to rbg. Their API used to be incredibly good until it broke a year or two ago, so there is even decent coverage going all the way back to 2006.
I hope this will be interesting and useful to folks. As far as I'm aware, this is the largest and most complete public public rbg dataset archive in existence.
| Database | Type | Record Count | First | Last |
| ------------------------------- | -------- | ------------ | ---------- | ---------- |
| rbg-rls.db | SQLite | 784813 | 2007-06-07 | 2023-05-31 |
| rbg-rls-v2.db | SQLite | 687204 | 2007-07-15 | 2023-05-31 |
| pretime.rbg_magnet_metadata.sql | Postgres | 257696 | 2021-11-18 | 2023-05-31 |
Enjoy!
Sincerely yours,
@Sleaze
p.s. Pull-requests and questions are most appreciated and welcome!
## Dataset Info and Stats
### rbg-rls.db
Source: Rarbg RSS XML feed (https://rarbg.to/rssdd_magnet.php)
Schema:
```sql
CREATE TABLE "release" (
id STRING PRIMARY KEY,
name TEXT,
time TEXT,
epoch INTEGER,
files TEXT,
tags TEXT,
detail_link TEXT,
magnet TEXT
);
```
Record distribution:
| Year | Count |
| ---- | ------ |
| 2007 | 4 |
| 2008 | 14 |
| 2009 | 167 |
| 2010 | 35 |
| 2011 | 63 |
| 2012 | 201 |
| 2013 | 421 |
| 2014 | 316 |
| 2015 | 185 |
| 2016 | 213 |
| 2017 | 156 |
| 2018 | 430 |
| 2019 | 525 |
| 2020 | 136430 |
| 2021 | 265032 |
| 2022 | 269122 |
| 2023 | 111649 |
### rbg-rls-v2.db
Source: Rarbg JSON API
Schema:
```sql
CREATE TABLE release (
id STRING PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
time TEXT NOT NULL,
epoch INTEGER NOT NULL,
ranked INTEGER NOT NULL,
size INTEGER,
tags TEXT,
metadata TEXT,
magnet TEXT NOT NULL
);
```
Record distribution:
| Year | Count |
| ---- | ------ |
| 2007 | 2 |
| 2008 | 27 |
| 2009 | 36 |
| 2010 | 53 |
| 2011 | 317 |
| 2012 | 14222 |
| 2013 | 34451 |
| 2014 | 36582 |
| 2015 | 35714 |
| 2016 | 37323 |
| 2017 | 42813 |
| 2018 | 48348 |
| 2019 | 56419 |
| 2020 | 105135 |
| 2021 | 153566 |
| 2022 | 51593 |
| 2023 | 70603 |
### pretime.rbg_magnet_metadata.sql
Source: Torrent Swarms
Schema:
```sql
CREATE TABLE public.rbg_magnet_metadata (
id character varying(40) NOT NULL,
display_name character varying(256) NOT NULL,
dir_name character varying(256) NOT NULL,
size bigint NOT NULL,
num_files integer NOT NULL,
files text NOT NULL,
magnet_url text NOT NULL,
pretime timestamp without time zone NOT NULL,
epoch bigint NOT NULL,
created_at timestamp with time zone DEFAULT now() NOT NULL
);
```
Record distribution:
| Year | Count |
| ---- | ------ |
| 2021 | 143902 |
| 2022 | 83445 |
| 2023 | 30349 |
## See also
* https://github.com/2004content/rarbg
* [sleaze-rarbg](sleaze-rarbg)
* [past-8-years](past-8-years)
* [2004content-rarbg](2004content-rarbg)
* [ipfs-QmbpRxBZ5HDZDVRoeAU8xFYnoP4r5eGCxdkmfFW3JbA6mq](ipfs-QmbpRxBZ5HDZDVRoeAU8xFYnoP4r5eGCxdkmfFW3JbA6mq)

View File

@ -0,0 +1,7 @@
# ipfs-QmbpRxBZ5HDZDVRoeAU8xFYnoP4r5eGCxdkmfFW3JbA6mq
## Source
https://ipfs.io/ipfs/QmbpRxBZ5HDZDVRoeAU8xFYnoP4r5eGCxdkmfFW3JbA6mq/
See also: [Orignal README.txt](README_orig.txt) for detailed instructions

View File

@ -0,0 +1,26 @@
README
======
Starting the HTTP Server
------------------------
On Windows (using cmd.exe aka the "Command Prompt"):
.\redbean-2.2.com -s -D . -l 127.0.0.1 -p 7671
On Linux, macOS, and *BSD:
bash -c './redbean-2.2.com -s -D . -l 127.0.0.1 -p 7671'
Using
-----
Visit http://127.0.0.1:7671/ on your web browser.
Alternatives
------------
Alternatively, you may also upload the contents of this directory to
an existing web server or a static hosting provider (such as GitHub
pages, GitLab pages, Netlify, Cloudflare Pages, ...); self-hosting is
recommended!
--------------------
https://the-eye.eu/public/Random/rarbg/rarbg_db.zip
magnet:?xt=urn:btih:ulfihylx35oldftn7qosmk6hkhsjq5af

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,515 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="referrer" content="no-referrer" />
<script defer src="dist/bundle.js"></script>
<title>RARBG IPFS</title>
<style>
/* http://meyerweb.com/eric/tools/css/reset/
v2.0 | 20110126
License: none (public domain)
*/
html,
body,
div,
span,
applet,
object,
iframe,
h1,
h2,
h3,
h4,
h5,
h6,
p,
blockquote,
a,
abbr,
acronym,
address,
big,
cite,
del,
dfn,
img,
ins,
kbd,
q,
s,
samp,
small,
strike,
strong,
sub,
sup,
var,
u,
i,
center,
dl,
dt,
dd,
li,
fieldset,
form,
label,
legend,
table,
caption,
tbody,
tfoot,
thead,
tr,
th,
td,
article,
aside,
canvas,
details,
embed,
figure,
figcaption,
footer,
header,
hgroup,
menu,
nav,
output,
ruby,
section,
summary,
time,
mark,
audio,
video {
margin: 0;
padding: 0;
border: 0;
font-size: 100%;
font: inherit;
vertical-align: baseline;
}
/* HTML5 display-role reset for older browsers */
article,
aside,
details,
figcaption,
figure,
footer,
header,
hgroup,
menu,
nav,
section {
display: block;
}
body {
line-height: 1;
}
blockquote,
q {
quotes: none;
}
blockquote:before,
blockquote:after,
q:before,
q:after {
content: '';
content: none;
}
table {
border-collapse: collapse;
border-spacing: 0;
}
/* apply a natural box layout model to all elements, but allowing components to change */
html {
box-sizing: border-box;
}
*,
*:before,
*:after {
box-sizing: inherit;
}
/* fonts */
html {
font-size: 100%;
}
/*16px*/
body {
background: white;
font-family: sans-serif;
font-weight: 400;
line-height: 1.3;
color: #000000;
}
p {
margin-bottom: 1rem;
}
h1,
h2,
h3,
h4,
h5 {
margin: 3rem 0 1.38rem;
font-family: sans-serif;
font-weight: bold;
line-height: 1.3;
}
h1 {
margin-top: 0;
font-size: 1.802rem;
}
h2 {
font-size: 1.602rem;
}
h3 {
font-size: 1.424rem;
}
h4 {
font-size: 1.266rem;
}
h5 {
font-size: 1.125rem;
}
small,
.text_small {
font-size: 0.889rem;
}
/* RARBG */
body {
font-family: arial;
}
.container {
display: flex;
flex-direction: column;
min-height: 100vh;
}
@media (min-width: 1024px) {
.container {
display: grid;
grid-template-columns: auto minmax(1000px, 80vw) auto;
grid-template-rows: auto 1fr auto;
line-height: 1.75;
}
}
header {
grid-column: span 3;
padding: 30px;
text-align: center;
font-size: 1.4em;
border-bottom: 1px solid brown;
color: brown;
}
main {
flex: 1;
padding: 20px;
text-align: center;
}
h1 {
margin-bottom: 0;
}
form {
display: inline-block;
width: 100%;
}
#formDiv {
margin-bottom: 2em;
}
input[type="search"] {
width: min(39em, 100%);
}
/* Source: https://stackoverflow.com/a/4485085/4466589 */
input[type="submit"],
input[type="search"] {
line-height: normal !important;
}
#welcome {
text-align: left;
width: min(78ch, 100%);
margin-left: auto;
margin-right: auto;
hyphens: auto;
}
#resultsDiv {
display: inline-grid;
text-align: left;
grid-template-columns: auto;
column-gap: 10px;
}
.category,
div.magnet {
display: none;
}
span.magnet {
display: inline;
}
.title a {
text-decoration: none;
font-weight: bold;
}
@media (min-width: 1024px) {
#resultsDiv {
grid-template-columns: auto auto auto auto;
}
.filesize {
margin-bottom: 0;
text-align: right;
}
.category,
div.magnet {
font-size: initial;
display: block;
}
div.magnet a {
text-decoration: none;
}
span.magnet {
display: none;
}
}
div.result {
margin-bottom: 1em;
}
div.result div:nth-child(1) a {
color: black;
text-decoration: none;
font-weight: bold;
}
div.result div:nth-child(2) {
color: gray;
}
button {
display: block;
margin-left: auto;
margin-right: auto;
margin-top: 1em;
}
tt {
font-family: monospace, monospace;
}
pre {
overflow: auto;
color: white;
background-color: black;
padding: 2px 1ch 2px 1ch;
font-family: monospace, monospace;
}
li pre {
margin-top: 0;
}
.tt {
font-family: monospace, monospace;
}
#update {
text-align: center;
margin-bottom: 1em;
}
.it {
font-style: italic;
}
</style>
<style>
/* https://www.digitalocean.com/community/tutorials/css-collapsible */
.collapsible-content {
max-height: 0px;
overflow: hidden;
}
.toggle:checked+.lbl-toggle+.collapsible-content {
max-height: initial;
}
input[type='checkbox'] {
display: none;
}
.lbl-toggle {
display: block;
font-weight: bold;
text-align: center;
cursor: pointer;
transition: all 0.25s ease-out;
}
.lbl-toggle::after {
content: ' ';
display: inline-block;
border-top: 5px solid transparent;
border-bottom: 5px solid transparent;
border-left: 5px solid currentColor;
vertical-align: middle;
margin-left: .7rem;
transform: translateY(-2px);
transition: transform .2s ease-out;
}
.toggle:checked+.lbl-toggle::after {
transform: rotate(90deg) translateX(-3px);
}
</style>
<style>
a {
color: black;
}
</style>
</head>
<body class="container">
<header>
<h1>RARBG <span title="InterPlanetary File System">IPFS</span></h1>
</header>
<nav>
</nav>
<main>
<div id="formDiv">
<form>
<fieldset disabled>
<input type="search" name="query" autofocus required spellcheck="false" autocorrect="off" />
<input type="submit" value="Initializing..." />
</fieldset>
</form>
</div>
<div id="welcome">
<div class="wrap-collabsible">
<input id="collapsible" class="toggle" type="checkbox">
<label for="collapsible" class="lbl-toggle">Getting Started</label>
<div class="collapsible-content">
<div class="content-inner">
<p>This is a <em>fully-decentralized</em> application to query the most complete dump of <a target="_blank" rel="noopener" href="https://en.wikipedia.org/wiki/RARBG">RARBG</a>, hosted
on <a target="_blank" rel="noopener" href="https://ipfs.tech/" title="InterPlanetary File System">IPFS</a>.</p>
<p>This application is a fork of <a target="_blank" rel="noopener" href="https://libgen.fun/dweb.html">Library Genesis IPFS</a>.</p>
<ul>
<li>Latency, especially on first use, can be greater than normal; this is expected.</li>
<li>Successive searches might fail due to rate-limiting by gateways.</li>
<li>You may require or prefer an <a target="_blank" rel="noopener"
href="https://ipfs.github.io/public-gateway-checker/">IPFS-to-HTTP(S) gateway</a>
due to lack of native IPFS support in mainstream browsers. Those gateways are a central
component that nullify the aims of decentralization, and can be censored just as easily
by your school/workplace, ISP, or your government.<ul>
<li>For a browser with built-in IPFS support, try <a target="_blank" rel="noopener"
href="https://brave.com/">Brave</a> or <a target="_blank" rel="noopener"
href="https://www.opera.com/">Opera</a>. Alternatively, you can also add
IPFS Companion extension to <a target="_blank" rel="noopener"
href="https://addons.mozilla.org/en-US/firefox/addon/ipfs-companion/">Firefox</a>
or <a target="_blank" rel="noopener"
href="https://chrome.google.com/webstore/detail/ipfs-companion/nibjojkomfdiaoajekhjakgkdhaomnch">Chrome</a>
(and Chrome derivatives, such as Edge or Vivaldi) <b>and</b> install <a
target="_blank" rel="noopener"
href="https://github.com/ipfs/ipfs-desktop#quick-install-shortcuts">IPFS
Desktop</a>.</li>
</ul>
</li>
</ul>
<h4>Fast local use</h4>
<p>You can download this page to search locally without constant internet access:</p>
<ol>
<li>Download recursively using <a target="_blank" rel="noopener"
href="https://eternallybored.org/misc/wget/">wget</a> (requires less than a GB of space):
<pre>wget -m -nH -np -P rarbg &lt;URL&gt;</pre>
<script>
// Replace <URL> in pre element
const pres = document.getElementsByTagName("pre");
const pre = pres[pres.length - 1];
pre.textContent = pre.textContent.replace("<URL>", window.location.toString());
</script>
</li>
<li>Start the HTTP server <a target="_blank" rel="noopener"
href="https://justine.lol/redbean/">redbean</a>:
<pre># On Windows:
.\redbean-2.2.com -s -D . -l 127.0.0.1 -p 7671
# On Linux, macOS, and *BSD:
bash -c './redbean-2.2.com -s -D . -l 127.0.0.1 -p 7671'</pre>
</li>
<li>Visit <a href="http://127.0.0.1:7671/">http://127.0.0.1:7671/</a> and voilà!</li>
</ol>
<p>You may also copy the <span class="tt">rarbg</span> folder to a USB drive or burn a CD for
your friends and
colleagues. :)</p>
<span style="display: none;"><a href="map.html"><span class="tt">map.html</span></a> is for
<span class="tt">wget</span>'s
spider...</span>
<h4>How does this work?</h4>
<p><a target="_blank" rel="noopener" href="https://sqlite.org">SQLite</a> compiled into <a
target="_blank" rel="noopener" href="https://webassembly.org/">WebAssembly</a> fetches
pages of the database hosted on IPFS through <a target="_blank" rel="noopener"
href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests">HTTP range
requests</a> using <a target="_blank" rel="noopener"
href="https://github.com/phiresky/sql.js-httpvfs">sql.js-httpvfs</a> layer, and then
evaluates your query in your browser.</p>
</div>
</div>
</div>
</div>
<div id="resultsDiv">
</div>
<button type="button" style="display: none;">Load More Results</button>
</main>
<aside>
</aside>
</body>
</html>

View File

@ -0,0 +1,9 @@
<a href="./map.html"></a>
<a href="./index.html"></a>
<a href="./dist/257fb50677e11621f8a0.js"></a>
<a href="./dist/bundle.js"></a>
<a href="./dist/fd38fda4d9036372d1aa.wasm"></a>
<a href="./source.tar.gz"></a>
<a href="./rarbg_db_ipfs.sqlite"></a>
<a href="./redbean-2.2.com"></a>
<a href="./README.txt"></a>

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:15b5593a317559ea5a43734d9281384ab94fa4671ce4b49ebb691281caad88bb
size 414908416

45
past-8-years/README.md Normal file
View File

@ -0,0 +1,45 @@
# past-8-years
## Source
magnet:?xt=urn:btih:ulfihylx35oldftn7qosmk6hkhsjq5af&dn=rarbg_db.zip&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce
(Discovered via the [comments in my Reddit post](https://www.reddit.com/r/trackers/comments/140ks0j/largest_public_rarbg_torrent_database_dump/)-> https://old.reddit.com/r/PiratedGames/comments/13wjasv/rarbg_torrents_shut_down/jmd5sbf/) - thank you /u/frozenpandaman!
## Schema
```sql
CREATE TABLE IF NOT EXISTS "items" (
`id` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
`hash` TEXT NOT NULL UNIQUE,
`title` TEXT NOT NULL,
`dt` TEXT NOT NULL,
`cat` TEXT NOT NULL,
`size` INTEGER,
`ext_id` TEXT,
`imdb` TEXT
);
```
## DB Stats
| Year | Count |
| ---- | ------ |
| 2007 | 9691 |
| 2008 | 23751 |
| 2009 | 18699 |
| 2010 | 25715 |
| 2011 | 49015 |
| 2012 | 102570 |
| 2013 | 176003 |
| 2014 | 217855 |
| 2015 | 204541 |
| 2016 | 220476 |
| 2017 | 235374 |
| 2018 | 270960 |
| 2019 | 319705 |
| 2020 | 287172 |
| 2021 | 291272 |
| 2022 | 272801 |
| 2023 | 118091 |

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:850eaa7f799b3d5e548ac2f12b93b401f8f87717ceef491965b20eddbba3b1c1
size 812711936

142
sleaze-rarbg/README.md Normal file
View File

@ -0,0 +1,142 @@
# rarbg-db-dumps
> "I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened."
rarbg.to closed this week on Wenesday, May 31, 2023 (2023-05-31).
I've been collecting databases containing magnet links and torrent metadata since 2019 or so.
I had to stop collecting detailed torrent info for all sections except XXX due to my ISP threatening me, even for only collecting metadata (the complaint filing systems of the *AA organizations don't differentiate between real downloaders and metadata-only downloaders).
Anyhow, here are SQLite and Postgres dumps of everything I have relating to rbg. Their API used to be incredibly good until it broke a year or two ago, so there is even decent coverage going all the way back to 2006.
I hope this will be interesting and useful to folks. As far as I'm aware, this is the largest and most complete public public rbg dataset archive in existence.
| Database | Type | Record Count | First | Last |
| ------------------------------- | -------- | ------------ | ---------- | ---------- |
| rbg-rls.db | SQLite | 784813 | 2007-06-07 | 2023-05-31 |
| rbg-rls-v2.db | SQLite | 687204 | 2007-07-15 | 2023-05-31 |
| pretime.rbg_magnet_metadata.sql | Postgres | 257696 | 2021-11-18 | 2023-05-31 |
Enjoy!
Sincerely yours,
@Sleaze
p.s. Pull-requests and questions are most appreciated and welcome!
## Dataset Info and Stats
### rbg-rls.db
Source: Rarbg RSS XML feed (https://rarbg.to/rssdd_magnet.php)
Schema:
```sql
CREATE TABLE "release" (
id STRING PRIMARY KEY,
name TEXT,
time TEXT,
epoch INTEGER,
files TEXT,
tags TEXT,
detail_link TEXT,
magnet TEXT
);
```
Record distribution:
| Year | Count |
| ---- | ------ |
| 2007 | 4 |
| 2008 | 14 |
| 2009 | 167 |
| 2010 | 35 |
| 2011 | 63 |
| 2012 | 201 |
| 2013 | 421 |
| 2014 | 316 |
| 2015 | 185 |
| 2016 | 213 |
| 2017 | 156 |
| 2018 | 430 |
| 2019 | 525 |
| 2020 | 136430 |
| 2021 | 265032 |
| 2022 | 269122 |
| 2023 | 111649 |
### rbg-rls-v2.db
Source: Rarbg JSON API
Schema:
```sql
CREATE TABLE release (
id STRING PRIMARY KEY NOT NULL,
name TEXT NOT NULL,
time TEXT NOT NULL,
epoch INTEGER NOT NULL,
ranked INTEGER NOT NULL,
size INTEGER,
tags TEXT,
metadata TEXT,
magnet TEXT NOT NULL
);
```
Record distribution:
| Year | Count |
| ---- | ------ |
| 2007 | 2 |
| 2008 | 27 |
| 2009 | 36 |
| 2010 | 53 |
| 2011 | 317 |
| 2012 | 14222 |
| 2013 | 34451 |
| 2014 | 36582 |
| 2015 | 35714 |
| 2016 | 37323 |
| 2017 | 42813 |
| 2018 | 48348 |
| 2019 | 56419 |
| 2020 | 105135 |
| 2021 | 153566 |
| 2022 | 51593 |
| 2023 | 70603 |
### pretime.rbg_magnet_metadata.sql
Source: Torrent Swarms
Schema:
```sql
CREATE TABLE public.rbg_magnet_metadata (
id character varying(40) NOT NULL,
display_name character varying(256) NOT NULL,
dir_name character varying(256) NOT NULL,
size bigint NOT NULL,
num_files integer NOT NULL,
files text NOT NULL,
magnet_url text NOT NULL,
pretime timestamp without time zone NOT NULL,
epoch bigint NOT NULL,
created_at timestamp with time zone DEFAULT now() NOT NULL
);
```
Record distribution:
| Year | Count |
| ---- | ------ |
| 2021 | 143902 |
| 2022 | 83445 |
| 2023 | 30349 |