This speeds up the resolved_* views by a reasonable amount, though it
does mean we won't be able to use the smarter resolution logic (which is
far too slow anyway at the moment, so I'm not sure what I'm going to do
about that in the future...)
Signed-off-by: Graham <gpe@openrs2.org>
This doesn't fix all cases, as a master index can have multiple sources,
each with a distinct copy of the same (archive, group, checksum,
version) tuple. However, it's probably as good as we'll be able to do
automatically - and it'll work particularly well for master indexes
downloaded directly over JS5, where we won't have done multiple imports
of the same cache.
Signed-off-by: Graham <gpe@openrs2.org>
These functions reduce the amount of group resolution logic
significantly, concentrating it in a single place. In addition to the
usual code de-duplication benefits, many of the queries are now much
simpler as the complexity is hidden behind the function calls.
This change also allows us to make the group resolution logic more
complicated. The first change is that the functions are guaranteed to
only return a single row, which was not true with the old JOIN-based
approach. The row that is chosen is chosen deterministically.
The resolution logic will probably be improved in the future, so we can
make a better decision where there are multiple possible groups, due to
collisions.
Signed-off-by: Graham <gpe@openrs2.org>
There are a few collisions in the production archive. I suspect these
are due to poorly modified caches, and tracking the source(s) of each
group will make it easier to determine which cache is probably
problematic.
This change also has the benefit of removing a lot of the hacky source
name/description merging logic.
Signed-off-by: Graham <gpe@openrs2.org>
The user running the migration might not have superuser permissions.
Using IF NOT EXISTS allows a root user to install the extension
manually, and then the migration will succeed as an unprivileged user.
Signed-off-by: Graham <gpe@openrs2.org>
It isn't solved yet, but once we've committed to v1 of the archiving
service we won't want to edit V1__init.sql ever again to avoid changing
its checksum (Flyway complains if the checksum changes).
Signed-off-by: Graham <gpe@openrs2.org>
Some non-empty loc groups are also unreachable, so I think this was
quite deceptive - e.g. on some OSRS revisions, we'll probably never hit
100% of the keys even if we exclude empty loc groups.
We can include the empty loc flag in the list of missing keys on the
per-cache pages instead.
Signed-off-by: Graham <gpe@openrs2.org>
The previous code over-counted as the use of LEFT JOINs meant candidate
group rows with the incorrect container_id were still included in the
results. Using an IN clause with a subquery allows us to remove those
rows, though it's a bit hacky. (Really I want to be able to use a JOIN
on the right side of a LEFT JOIN to restrict the rows that appear in the
results of the LEFT JOIN, but that doesn't seem to be possible.)
Signed-off-by: Graham <gpe@openrs2.org>
This commit also adds support for populating the whirlpool column, and
ensures version is set to 0 for the ORIGINAL master index format.
Signed-off-by: Graham <gpe@openrs2.org>
I'm not sure if the auto-detection code works: I'm assuming that the new
format was introduced at the same time as the lengths flag in Js5Index,
but I haven't confirmed this.
Signed-off-by: Graham <gpe@openrs2.org>
This makes us behave like a standard client that only keeps a single
copy of each group in its cache. This ensures we can at least detect
(crc32, version) collisions for a particular group, rather than silently
skipping colliding cached groups.
A disadvantage is that more bandwidth usage is required, especially if
the download is interrupted.
Signed-off-by: Graham <gpe@openrs2.org>
The CTE is now declared as NOT MATERIALIZED to ensure Postgres is able
to push the WHERE master_index_id condition inside it.
Signed-off-by: Graham <gpe@openrs2.org>
There's no real use for these yet, but they might be useful with NXT
caches.
We don't need a compressed_length column because it's easy to determine
the length of a BYTEA column within the database.
Signed-off-by: Graham <gpe@openrs2.org>
Although it isn't necessary, we might as well as it doesn't take up much
extra space and we already store all the properties for all groups and
files.
Signed-off-by: Graham <gpe@openrs2.org>