Add support for duplicate class names in the same ClassPath #111

Closed
opened 4 years ago by gpe · 6 comments
gpe commented 4 years ago
Owner

This will allow:

  • removal of the unpackclass_ and loader_ prefixing hack
  • the NameMap may be used to remap common unpackclass and client classes to the same name (e.g. the buffer, node and bzip2 classes)
  • (possibly as a long-term, low-priority goal) deobfuscating the unsigned signlink at the same time as the signed one, which will require joining up inherited member disjoint sets across libraries

Unfortunately it's going to require a massive change across everything that touches ASM in the codebase.

An initial rough plan is:

  • give each Library a name (as a side effect, we can use this to improve the client detection in the static scrambling code)
  • allow libraries to depend on other libraries
  • add a special RuntimeLibrary for representing the standard JDK classes
  • add the library name to MemberRef
  • remove support for grabbing a class by name from ClassPath - instead, this must always go through a Library which can resolve the class by looking through its transitive dependencies
  • add library names to the @OriginalXXX annotations (or can we rely on the implicit names based on the .yaml file they're in?)
This will allow: * removal of the `unpackclass_` and `loader_` prefixing hack * the `NameMap` may be used to remap common `unpackclass` and `client` classes to the same name (e.g. the buffer, node and bzip2 classes) * (possibly as a long-term, low-priority goal) deobfuscating the unsigned signlink at the same time as the signed one, which will require joining up inherited member disjoint sets across libraries Unfortunately it's going to require a massive change across everything that touches ASM in the codebase. An initial rough plan is: * [x] give each `Library` a name (as a side effect, we can use this to improve the client detection in the static scrambling code) * [x] allow libraries to depend on other libraries * [x] add a special RuntimeLibrary for representing the standard JDK classes * [ ] add the library name to `MemberRef` * [ ] remove support for grabbing a class by name from `ClassPath` - instead, this must always go through a `Library` which can resolve the class by looking through its transitive dependencies * [ ] add library names to the `@OriginalXXX` annotations (or can we rely on the implicit names based on the `.yaml` file they're in?)
gpe added the
asm
improvement
deobfuscator
patcher
labels 4 years ago
Poster
Owner

ClassPath should probably be renamed to something like LibrarySet in the future, as each individual library will effectively manage its own classpath.

`ClassPath` should probably be renamed to something like `LibrarySet` in the future, as each individual library will effectively manage its own classpath.
Poster
Owner

ClassMetadata::dependency could also be renamed, now its meaning has changed such that it is only true if a class is from the runtime.

... or do we still need something like it? It is important to block renaming of overriden library methods that won't be remapped (as the library isn't in the ClassPath).

Perhaps remap() should walk the tree of libraries recursively?

`ClassMetadata::dependency` could also be renamed, now its meaning has changed such that it is only `true` if a class is from the runtime. ... or do we still need something like it? It is important to block renaming of overriden library methods that won't be remapped (as the library isn't in the ClassPath). Perhaps remap() should walk the tree of libraries recursively?
Poster
Owner

Maybe the inherited member disjoint sets should be more aware of field/method resolution.

This would allow us to:

  • remove the inherited field disjoint set entirely
  • only include declared methods in the inherited method disjoint set
Maybe the inherited member disjoint sets should be more aware of field/method resolution. This would allow us to: * remove the inherited field disjoint set entirely * only include declared methods in the inherited method disjoint set
Poster
Owner

I've realised that the latter won't work, as field/method resolution is not deterministic - which is particularly problematic for interfaces (where two interfaces implemented by one class might contain a method with the same name and therefore must be renamed together).

I've realised that the latter won't work, as field/method resolution is not deterministic - which is particularly problematic for interfaces (where two interfaces implemented by one class might contain a method with the same name and therefore must be renamed together).
Poster
Owner

Another alternative to the above: take the class prefixing hack further and prefix everything as we read it and strip the prefixes just before we write it.

So we'd have a sequence like:

a => client$a => client$Class1 => client$Node => Node (not ideal as it conflicts with inner classes, so the log messages might cause confusion)

or:

a => client/a => client/Class1 => client/Node => Node (which would make renaming packaged classes easier, but doesn't stand out as much as being "special" syntax)

Perhaps combining both, we could have something like $client/a, client$/a or $client$/a. $client$/a perhaps stands out the most.

This means we wouldn't need to change the guts of the ASM library, bundler and deobfuscator: just the first and last stages of the deobfuscator.

I like this better than only renaming a selection of the libraries (as we do now): it becomes unambiguous.

The @OriginalXXX syntax could also understand the names, and extract the library part out into a separate field.

It also means we could remove the Library class entirely (except for reading/writing classes?) and the ClassPath could store a list of ClassNodes directly. This would allow us to remove an argument from a bunch of transformer methods and also remove a level of nesting from a bunch of loops.

Another alternative to the above: take the class prefixing hack further and prefix _everything_ as we read it and strip the prefixes just before we write it. So we'd have a sequence like: `a` => `client$a` => `client$Class1` => `client$Node` => `Node` (not ideal as it conflicts with inner classes, so the log messages might cause confusion) or: `a` => `client/a` => `client/Class1` => `client/Node` => `Node` (which would make renaming packaged classes easier, but doesn't stand out as much as being "special" syntax) Perhaps combining both, we could have something like `$client/a`, `client$/a` or `$client$/a`. `$client$/a` perhaps stands out the most. This means we wouldn't need to change the guts of the ASM library, bundler and deobfuscator: just the first and last stages of the deobfuscator. I like this better than only renaming a selection of the libraries (as we do now): it becomes unambiguous. The `@OriginalXXX` syntax could also understand the names, and extract the library part out into a separate field. It also means we could remove the `Library` class entirely (except for reading/writing classes?) and the `ClassPath` could store a list of `ClassNode`s directly. This would allow us to remove an argument from a bunch of transformer methods and also remove a level of nesting from a bunch of loops.
Poster
Owner

I've gone with the prefixing solution - see b1bc7377fce8f80f8aee19efd5cca1a6e48dd76d

I've gone with the prefixing solution - see b1bc7377fce8f80f8aee19efd5cca1a6e48dd76d
gpe closed this issue 4 years ago
Sign in to join this conversation.
Loading…
There is no content yet.