This doesn't fix all cases, as a master index can have multiple sources,
each with a distinct copy of the same (archive, group, checksum,
version) tuple. However, it's probably as good as we'll be able to do
automatically - and it'll work particularly well for master indexes
downloaded directly over JS5, where we won't have done multiple imports
of the same cache.
Signed-off-by: Graham <gpe@openrs2.org>
These functions reduce the amount of group resolution logic
significantly, concentrating it in a single place. In addition to the
usual code de-duplication benefits, many of the queries are now much
simpler as the complexity is hidden behind the function calls.
This change also allows us to make the group resolution logic more
complicated. The first change is that the functions are guaranteed to
only return a single row, which was not true with the old JOIN-based
approach. The row that is chosen is chosen deterministically.
The resolution logic will probably be improved in the future, so we can
make a better decision where there are multiple possible groups, due to
collisions.
Signed-off-by: Graham <gpe@openrs2.org>
There are a few collisions in the production archive. I suspect these
are due to poorly modified caches, and tracking the source(s) of each
group will make it easier to determine which cache is probably
problematic.
This change also has the benefit of removing a lot of the hacky source
name/description merging logic.
Signed-off-by: Graham <gpe@openrs2.org>
Postgres would otherwise deadlock forever. I'm not sure why this worked
when testing - perhaps we always had fewer than BATCH_SIZE rows?
Signed-off-by: Graham <gpe@openrs2.org>
We still want to merge the build and timestamp as caches can be
associated with multiple build numbers, and I always want us to use the
earliest number.
Signed-off-by: Graham <gpe@openrs2.org>
Some master indexes are used across multiple builds. It makes sense to
use the minimum build number, much like how we use the minimum
timestamp.
Signed-off-by: Graham <gpe@openrs2.org>
The user running the migration might not have superuser permissions.
Using IF NOT EXISTS allows a root user to install the extension
manually, and then the migration will succeed as an unprivileged user.
Signed-off-by: Graham <gpe@openrs2.org>
It isn't solved yet, but once we've committed to v1 of the archiving
service we won't want to edit V1__init.sql ever again to avoid changing
its checksum (Flyway complains if the checksum changes).
Signed-off-by: Graham <gpe@openrs2.org>
I'm going to try to minimise use of this (as per
https://github.com/google/guice/wiki/Avoid-Injecting-Closable-Resources).
For example, I'm going to inject a pooling DataSource rather than
Connection objects, as per the advice at the end of the page. However,
HikariCP's DataSource implementation is itself Closeable.
Signed-off-by: Graham <gpe@openrs2.org>
The previous code included too many rows as the use of LEFT JOINs meant
candidate group orws with an incorrect container_id were still included
in the results.
Using an IN clause with a subquery allows us to remove those rows,
though it's a bit hacky.
(Really I want to be able to use the JOIN on the right side of a LEFT
JOIN to restrict the rows that appear in the results of the LEFT JOIN,
but that doesn't seem to be possible.)
This is similar to the issue fixed by
a920570f04.
Signed-off-by: Graham <gpe@openrs2.org>