第 12 章 序列化

【85】Java 序列化的替代方案

Prefer alternatives to Java serialization

When serialization was added to Java in 1997, it was known to be somewhat risky. The approach had been tried in a research language (Modula-3) but never in a production language. While the promise of distributed objects with little effort on the part of the programmer was appealing, the price was invisible constructors and blurred lines between API and implementation, with the potential for problems with correctness, performance, security, and maintenance. Proponents believed the benefits outweighed the risks, but history has shown otherwise.

当序列化在 1997 年添加到 Java 中时,它被认为有一定的风险。这种方法曾在研究语言(Modula-3)中尝试过,但从未在生产语言中使用过。虽然程序员不费什么力气就能实现分布式对象,这一点很吸引人,但代价也不小,如:不可见的构造函数、API 与实现之间模糊的界线,还可能会出现正确性、性能、安全性和维护方面的问题。支持者认为收益大于风险,但历史证明并非如此。

The security issues described in previous editions of this book turned out to be every bit as serious as some had feared. The vulnerabilities discussed in the early 2000s were transformed into serious exploits over the next decade, famously including a ransomware attack on the San Francisco Metropolitan Transit Agency Municipal Railway (SFMTA Muni) that shut down the entire fare collection system for two days in November 2016 [Gallagher16].

在本书之前的版本中描述的安全问题,和人们担心的一样严重。21 世纪初仅停留在讨论的漏洞在接下来的 10 年间变成了真实严重的漏洞,其中最著名的包括 2016 年 11 月对旧金山大都会运输署市政铁路(SFMTA Muni)的勒索软件攻击,导致整个收费系统关闭了两天 [Gallagher16]。

A fundamental problem with serialization is that its attack surface is too big to protect, and constantly growing: Object graphs are deserialized by invoking the readObject method on an ObjectInputStream. This method is essentially a magic constructor that can be made to instantiate objects of almost any type on the class path, so long as the type implements the Serializable interface. In the process of deserializing a byte stream, this method can execute code from any of these types, so the code for all of these types is part of the attack surface.

序列化的一个根本问题是它的可攻击范围太大,且难以保护,而且问题还在不断增多:通过调用 ObjectInputStream 上的 readObject 方法反序列化对象图。这个方法本质上是一个神奇的构造函数,可以用来实例化类路径上几乎任何类型的对象,只要该类型实现 Serializable 接口。在反序列化字节流的过程中,此方法可以执行来自任何这些类型的代码,因此所有这些类型的代码都在攻击范围内。

The attack surface includes classes in the Java platform libraries, in third-party libraries such as Apache Commons Collections, and in the application itself. Even if you adhere to all of the relevant best practices and succeed in writing serializable classes that are invulnerable to attack, your application may still be vulnerable. To quote Robert Seacord, technical manager of the CERT Coordination Center:

攻击可涉及 Java 平台库、第三方库(如 Apache Commons collection)和应用程序本身中的类。即使坚持履行实践了所有相关的最佳建议,并成功地编写了不受攻击的可序列化类,应用程序仍然可能是脆弱的。引用 CERT 协调中心技术经理 Robert Seacord 的话:

Java deserialization is a clear and present danger as it is widely used both directly by applications and indirectly by Java subsystems such as RMI (Remote Method Invocation), JMX (Java Management Extension), and JMS (Java Messaging System). Deserialization of untrusted streams can result in remote code execution (RCE), denial-of-service (DoS), and a range of other exploits. Applications can be vulnerable to these attacks even if they did nothing wrong. [Seacord17]

Java 反序列化是一个明显且真实的危险源,因为它被应用程序直接和间接地广泛使用,比如 RMI(远程方法调用)、JMX(Java 管理扩展)和 JMS(Java 消息传递系统)。不可信流的反序列化可能导致远程代码执行(RCE)、拒绝服务(DoS)和一系列其他攻击。应用程序很容易受到这些攻击,即使它们本身没有错误。[Seacord17]

Attackers and security researchers study the serializable types in the Java libraries and in commonly used third-party libraries, looking for methods invoked during deserialization that perform potentially dangerous activities. Such methods are known as gadgets. Multiple gadgets can be used in concert, to form a gadget chain. From time to time, a gadget chain is discovered that is sufficiently powerful to allow an attacker to execute arbitrary native code on the underlying hardware, given only the opportunity to submit a carefully crafted byte stream for deserialization. This is exactly what happened in the SFMTA Muni attack. This attack was not isolated. There have been others, and there will be more.

攻击者和安全研究人员研究 Java 库和常用的第三方库中的可序列化类型,寻找在反序列化过程中调用的潜在危险活动的方法称为 gadget。多个小工具可以同时使用,形成一个小工具链。偶尔会发现一个小部件链,它的功能足够强大,允许攻击者在底层硬件上执行任意的本机代码,允许提交精心设计的字节流进行反序列化。这正是 SFMTA Muni 袭击中发生的事情。这次袭击并不是孤立的。不仅已经存在,而且还会有更多。

Without using any gadgets, you can easily mount a denial-of-service attack by causing the deserialization of a short stream that requires a long time to deserialize. Such streams are known as deserialization bombs [Svoboda16]. Here’s an example by Wouter Coekaerts that uses only hash sets and a string [Coekaerts15]:

不使用任何 gadget,你都可以通过对需要很长时间才能反序列化的短流进行反序列化,轻松地发起拒绝服务攻击。这种流被称为反序列化炸弹 [Svoboda16]。下面是 Wouter Coekaerts 的一个例子,它只使用哈希集和字符串 [Coekaerts15]:

  1. // Deserialization bomb - deserializing this stream takes forever
  2. static byte[] bomb() {
  3. Set<Object> root = new HashSet<>();
  4. Set<Object> s1 = root;
  5. Set<Object> s2 = new HashSet<>();
  6. for (int i = 0; i < 100; i++) {
  7. Set<Object> t1 = new HashSet<>();
  8. Set<Object> t2 = new HashSet<>();
  9. t1.add("foo"); // Make t1 unequal to t2
  10. s1.add(t1); s1.add(t2);
  11. s2.add(t1); s2.add(t2);
  12. s1 = t1;
  13. s2 = t2;
  14. }
  15. return serialize(root); // Method omitted for brevity
  16. }

The object graph consists of 201 HashSet instances, each of which contains 3 or fewer object references. The entire stream is 5,744 bytes long, yet the sun would burn out long before you could deserialize it. The problem is that deserializing a HashSet instance requires computing the hash codes of its elements. The 2 elements of the root hash set are themselves hash sets containing 2 hash-set elements, each of which contains 2 hash-set elements, and so on, 100 levels deep. Therefore, deserializing the set causes the hashCode method to be invoked over 2100 times. Other than the fact that the deserialization is taking forever, the deserializer has no indication that anything is amiss. Few objects are produced, and the stack depth is bounded.

对象图由 201 个 HashSet 实例组成,每个实例包含 3 个或更少的对象引用。整个流的长度为 5744 字节,但是在你对其进行反序列化之前,资源就已经耗尽了。问题在于,反序列化 HashSet 实例需要计算其元素的哈希码。根哈希集的 2 个元素本身就是包含 2 个哈希集元素的哈希集,每个哈希集元素包含 2 个哈希集元素,以此类推,深度为 100。因此,反序列化 Set 会导致 hashCode 方法被调用超过 2100 次。除了反序列化会持续很长时间之外,反序列化器没有任何错误的迹象。生成的对象很少,并且堆栈深度是有界的。

So what can you do defend against these problems? You open yourself up to attack whenever you deserialize a byte stream that you don’t trust. The best way to avoid serialization exploits is never to deserialize anything. In the words of the computer named Joshua in the 1983 movie WarGames, “the only winning move is not to play.” There is no reason to use Java serialization in any new system you write. There are other mechanisms for translating between objects and byte sequences that avoid many of the dangers of Java serialization, while offering numerous advantages, such as cross-platform support, high performance, a large ecosystem of tools, and a broad community of expertise. In this book, we refer to these mechanisms as cross-platform structured-data representations. While others sometimes refer to them as serialization systems, this book avoids that usage to prevent confusion with Java serialization.

那么你能做些什么来抵御这些问题呢?当你反序列化一个你不信任的字节流时,你就会受到攻击。避免序列化利用的最好方法是永远不要反序列化任何东西。 用 1983 年电影《战争游戏》(WarGames)中名为约书亚(Joshua)的电脑的话来说,「唯一的制胜绝招就是不玩。」没有理由在你编写的任何新系统中使用 Java 序列化。 还有其他一些机制可以在对象和字节序列之间进行转换,从而避免了 Java 序列化的许多危险,同时还提供了许多优势,比如跨平台支持、高性能、大量工具和广泛的专家社区。在本书中,我们将这些机制称为跨平台结构数据表示。虽然其他人有时将它们称为序列化系统,但本书避免使用这种说法,以免与 Java 序列化混淆。

What these representations have in common is that they’re far simpler than Java serialization. They don’t support automatic serialization and deserialization of arbitrary object graphs. Instead, they support simple, structured data-objects consisting of a collection of attribute-value pairs. Only a few primitive and array data types are supported. This simple abstraction turns out to be sufficient for building extremely powerful distributed systems and simple enough to avoid the serious problems that have plagued Java serialization since its inception.

以上所述技术的共同点是它们比 Java 序列化简单得多。它们不支持任意对象图的自动序列化和反序列化。相反,它们支持简单的结构化数据对象,由一组「属性-值」对组成。只有少数基本数据类型和数组数据类型得到支持。事实证明,这个简单的抽象足以构建功能极其强大的分布式系统,而且足够简单,可以避免 Java 序列化从一开始就存在的严重问题。

The leading cross-platform structured data representations are JSON [JSON] and Protocol Buffers, also known as protobuf [Protobuf]. JSON was designed by Douglas Crockford for browser-server communication, and protocol buffers were designed by Google for storing and interchanging structured data among its servers. Even though these representations are sometimes called languageneutral, JSON was originally developed for JavaScript and protobuf for C++; both representations retain vestiges of their origins.

领先的跨平台结构化数据表示是 JSON 和 Protocol Buffers,也称为 protobuf。JSON 由 Douglas Crockford 设计用于浏览器与服务器通信,Protocol Buffers 由谷歌设计用于在其服务器之间存储和交换结构化数据。尽管这些技术有时被称为「中性语言」,但 JSON 最初是为 JavaScript 开发的,而 protobuf 是为 c++ 开发的;这两种技术都保留了其起源的痕迹。

The most significant differences between JSON and protobuf are that JSON is text-based and human-readable, whereas protobuf is binary and substantially more efficient; and that JSON is exclusively a data representation, whereas protobuf offers schemas (types) to document and enforce appropriate usage. Although protobuf is more efficient than JSON, JSON is extremely efficient for a text-based representation. And while protobuf is a binary representation, it does provide an alternative text representation for use where human-readability is desired (pbtxt).

JSON 和 protobuf 之间最显著的区别是 JSON 是基于文本的,并且是人类可读的,而 protobuf 是二进制的,但效率更高;JSON 是一种专门的数据表示,而 protobuf 提供模式(类型)来记录和执行适当的用法。虽然 protobuf 比 JSON 更有效,但是 JSON 对于基于文本的表示非常有效。虽然 protobuf 是一种二进制表示,但它确实提供了另一种文本表示,可用于需要具备人类可读性的场景(pbtxt)。

If you can’t avoid Java serialization entirely, perhaps because you’re working in the context of a legacy system that requires it, your next best alternative is to never deserialize untrusted data. In particular, you should never accept RMI traffic from untrusted sources. The official secure coding guidelines for Java say “Deserialization of untrusted data is inherently dangerous and should be avoided.” This sentence is set in large, bold, italic, red type, and it is the only text in the entire document that gets this treatment [Java-secure].

如果你不能完全避免 Java 序列化,可能是因为你需要在遗留系统环境中工作,那么你的下一个最佳选择是 永远不要反序列化不可信的数据。 特别要注意,你不应该接受来自不可信来源的 RMI 流量。Java 的官方安全编码指南说:「反序列化不可信的数据本质上是危险的,应该避免。」这句话是用大号、粗体、斜体和红色字体设置的,它是整个文档中唯一得到这种格式处理的文本。[Java-secure]

If you can’t avoid serialization and you aren’t absolutely certain of the safety of the data you’re deserializing, use the object deserialization filtering added in Java 9 and backported to earlier releases (java.io.ObjectInputFilter). This facility lets you specify a filter that is applied to data streams before they’re deserialized. It operates at the class granularity, letting you accept or reject certain classes. Accepting classes by default and rejecting a list of potentially dangerous ones is known as blacklisting; rejecting classes by default and accepting a list of those that are presumed safe is known as whitelisting. Prefer whitelisting to blacklisting, as blacklisting only protects you against known threats. A tool called Serial Whitelist Application Trainer (SWAT) can be used to automatically prepare a whitelist for your application [Schneider16]. The filtering facility will also protect you against excessive memory usage, and excessively deep object graphs, but it will not protect you against serialization bombs like the one shown above.

如果无法避免序列化,并且不能绝对确定反序列化数据的安全性,那么可以使用 Java 9 中添加的对象反序列化筛选,并将其移植到早期版本(java.io.ObjectInputFilter)。该工具允许你指定一个过滤器,该过滤器在反序列化数据流之前应用于数据流。它在类粒度上运行,允许你接受或拒绝某些类。默认接受所有类,并拒绝已知潜在危险类的列表称为黑名单;在默认情况下拒绝其他类,并接受假定安全的类的列表称为白名单。优先选择白名单而不是黑名单, 因为黑名单只保护你免受已知的威胁。一个名为 Serial Whitelist Application Trainer(SWAT)的工具可用于为你的应用程序自动准备一个白名单 [Schneider16]。过滤工具还将保护你免受过度内存使用和过于深入的对象图的影响,但它不能保护你免受如上面所示的序列化炸弹的影响。

Unfortunately, serialization is still pervasive in the Java ecosystem. If you are maintaining a system that is based on Java serialization, seriously consider migrating to a cross-platform structured-data representation, even though this may be a time-consuming endeavor. Realistically, you may still find yourself having to write or maintain a serializable class. It requires great care to write a serializable class that is correct, safe, and efficient. The remainder of this chapter provides advice on when and how to do this.

不幸的是,序列化在 Java 生态系统中仍然很普遍。如果你正在维护一个基于 Java 序列化的系统,请认真考虑迁移到跨平台的结构化数据,尽管这可能是一项耗时的工作。实际上,你可能仍然需要编写或维护一个可序列化的类。编写一个正确、安全、高效的可序列化类需要非常小心。本章的其余部分将提供何时以及如何进行此操作的建议。

In summary, serialization is dangerous and should be avoided. If you are designing a system from scratch, use a cross-platform structured-data representation such as JSON or protobuf instead. Do not deserialize untrusted data. If you must do so, use object deserialization filtering, but be aware that it is not guaranteed to thwart all attacks. Avoid writing serializable classes. If you must do so, exercise great caution.

总之,序列化是危险的,应该避免。如果你从头开始设计一个系统,可以使用跨平台的结构化数据,如 JSON 或 protobuf。不要反序列化不可信的数据。如果必须这样做,请使用对象反序列化过滤,但要注意,它不能保证阻止所有攻击。避免编写可序列化的类。如果你必须这样做,一定要非常小心。


【86】非常谨慎地实现 Serializable

Implement Serializable with great caution

Allowing a class’s instances to be serialized can be as simple as adding the words implements Serializable to its declaration. Because this is so easy to do, there was a common misconception that serialization requires little effort on the part of the programmer. The truth is far more complex. While the immediate cost to make a class serializable can be negligible, the long-term costs are often substantial.

使类的实例可序列化非常简单,只需实现 Serializable 接口即可。因为这很容易做到,所以有一个普遍的误解,认为序列化只需要程序员付出很少的努力。而事实上要复杂得多。虽然使类可序列化的即时代价可以忽略不计,但长期代价通常是巨大的。

A major cost of implementing Serializable is that it decreases the flexibility to change a class’s implementation once it has been released. When a class implements Serializable, its byte-stream encoding (or serialized form) becomes part of its exported API. Once you distribute a class widely, you are generally required to support the serialized form forever, just as you are required to support all other parts of the exported API. If you do not make the effort to design a custom serialized form but merely accept the default, the serialized form will forever be tied to the class’s original internal representation. In other words, if you accept the default serialized form, the class’s private and package-private instance fields become part of its exported API, and the practice of minimizing access to fields (Item 15) loses its effectiveness as a tool for information hiding.

实现 Serializable 接口的一个主要代价是,一旦类的实现被发布,它就会降低更改该类实现的灵活性。 当类实现 Serializable 时,其字节流编码(或序列化形式)成为其导出 API 的一部分。一旦广泛分发了一个类,通常就需要永远支持序列化的形式,就像需要支持导出 API 的所有其他部分一样。如果你不努力设计自定义序列化形式,而只是接受默认形式,则序列化形式将永远绑定在类的原始内部实现上。换句话说,如果你接受默认的序列化形式,类中私有的包以及私有实例字段将成为其导出 API 的一部分,此时最小化字段作用域(Item-15)作为信息隐藏的工具,将失去其有效性。

If you accept the default serialized form and later change a class’s internal representation, an incompatible change in the serialized form will result. Clients attempting to serialize an instance using an old version of the class and deserialize it using the new one (or vice versa) will experience program failures. It is possible to change the internal representation while maintaining the original serialized form (using ObjectOutputStream.putFields and ObjectInputStream.readFields), but it can be difficult and leaves visible warts in the source code. If you opt to make a class serializable, you should carefully design a high-quality serialized form that you’re willing to live with for the long haul (Items 87, 90). Doing so will add to the initial cost of development, but it’s worth the effort. Even a well-designed serialized form places constraints on the evolution of a class; an ill-designed serialized form can be crippling.

如果你接受默认的序列化形式,然后更改了类的内部实现,则会导致与序列化形式不兼容。试图使用类的旧版本序列化实例,再使用新版本反序列化实例的客户端(反之亦然)程序将会失败。当然,可以在维护原始序列化形式的同时更改内部实现(使用 ObjectOutputStream.putFields 或 ObjectInputStream.readFields),但这可能会很困难,并在源代码中留下明显的缺陷。如果你选择使类可序列化,你应该仔细设计一个高质量的序列化形式,以便长期使用(Item-87、Item-90)。这样做会增加开发的初始成本,但是这样做是值得的。即使是设计良好的序列化形式,也会限制类的演化;而设计不良的序列化形式,则可能会造成严重后果。

A simple example of the constraints on evolution imposed by serializability concerns stream unique identifiers, more commonly known as serial version UIDs. Every serializable class has a unique identification number associated with it. If you do not specify this number by declaring a static final long field named serialVersionUID, the system automatically generates it at runtime by applying a cryptographic hash function (SHA-1) to the structure of the class. This value is affected by the names of the class, the interfaces it implements, and most of its members, including synthetic members generated by the compiler. If you change any of these things, for example, by adding a convenience method, the generated serial version UID changes. If you fail to declare a serial version UID, compatibility will be broken, resulting in an InvalidClassException at runtime.

可序列化会使类的演变受到限制,施加这种约束的一个简单示例涉及流的唯一标识符,通常称其为串行版本 UID。每个可序列化的类都有一个与之关联的唯一标识符。如果你没有通过声明一个名为 serialVersionUID 的静态 final long 字段来指定这个标识符,那么系统将在运行时对类应用加密散列函数(SHA-1)自动生成它。这个值受到类的名称、实现的接口及其大多数成员(包括编译器生成的合成成员)的影响。如果你更改了其中任何一项,例如,通过添加一个临时的方法,生成的序列版本 UID 就会更改。如果你未能声明序列版本 UID,兼容性将被破坏,从而在运行时导致 InvalidClassException。

A second cost of implementing Serializable is that it increases the likelihood of bugs and security holes (Item 85). Normally, objects are created with constructors; serialization is an extralinguistic mechanism for creating objects. Whether you accept the default behavior or override it, deserialization is a “hidden constructor” with all of the same issues as other constructors. Because there is no explicit constructor associated with deserialization, it is easy to forget that you must ensure that it guarantees all of the invariants established by the constructors and that it does not allow an attacker to gain access to the internals of the object under construction. Relying on the default deserialization mechanism can easily leave objects open to invariant corruption and illegal access (Item 88).

实现 Serializable 接口的第二个代价是,增加了出现 bug 和安全漏洞的可能性(第 85 项)。 通常,对象是用构造函数创建的;序列化是一种用于创建对象的超语言机制。无论你接受默认行为还是无视它,反序列化都是一个「隐藏构造函数」,其他构造函数具有的所有问题它都有。由于没有与反序列化关联的显式构造函数,因此很容易忘记必须让它能够保证所有的不变量都是由构造函数建立的,并且不允许攻击者访问正在构造的对象内部。依赖于默认的反序列化机制,会让对象轻易地遭受不变性破坏和非法访问(Item-88)。

A third cost of implementing Serializable is that it increases the testing burden associated with releasing a new version of a class. When a serializable class is revised, it is important to check that it is possible to serialize an instance in the new release and deserialize it in old releases, and vice versa. The amount of testing required is thus proportional to the product of the number of serializable classes and the number of releases, which can be large. You must ensure both that the serialization-deserialization process succeeds and that it results in a faithful replica of the original object. The need for testing is reduced if a custom serialized form is carefully designed when the class is first written (Items 87, 90).

实现 Serializable 接口的第三个代价是,它增加了与发布类的新版本相关的测试负担。 当一个可序列化的类被修改时,重要的是检查是否可以在新版本中序列化一个实例,并在旧版本中反序列化它,反之亦然。因此,所需的测试量与可序列化类的数量及版本的数量成正比,工作量可能很大。你必须确保「序列化-反序列化」过程成功,并确保它生成原始对象的无差错副本。如果在第一次编写类时精心设计了自定义序列化形式,那么测试的工作量就会减少(Item-87、Item-90)。

Implementing Serializable is not a decision to be undertaken lightly. It is essential if a class is to participate in a framework that relies on Java serialization for object transmission or persistence. Also, it greatly eases the use of a class as a component in another class that must implement Serializable. There are, however, many costs associated with implementing Serializable. Each time you design a class, weigh the costs against the benefits. Historically, value classes such as BigInteger and Instant implemented Serializable, and collection classes did too. Classes representing active entities, such as thread pools, should rarely implement Serializable.

实现 Serializable 接口并不是一个轻松的决定。 如果一个类要参与一个框架,该框架依赖于 Java 序列化来进行对象传输或持久化,这对于类来说实现 Serializable 接口就是非常重要的。此外,如果类 A 要成为另一个类 B 的一个组件,类 B 必须实现 Serializable 接口,若类 A 可序列化,它就会更易于被使用。然而,与实现 Serializable 相关的代价很多。每次设计一个类时,都要权衡利弊。历史上,像 BigInteger 和 Instant 这样的值类实现了 Serializable 接口,集合类也实现了 Serializable 接口。表示活动实体(如线程池)的类很少情况适合实现 Serializable 接口。

Classes designed for inheritance (Item 19) should rarely implement Serializable, and interfaces should rarely extend it. Violating this rule places a substantial burden on anyone who extends the class or implements the interface. There are times when it is appropriate to violate the rule. For example, if a class or interface exists primarily to participate in a framework that requires all participants to implement Serializable, then it may make sense for the class or interface to implement or extend Serializable.

为继承而设计的类(Item-19)很少情况适合实现 Serializable 接口,接口也很少情况适合扩展它。 违反此规则会给扩展类或实现接口的任何人带来很大的负担。有时,违反规则是恰当的。例如,如果一个类或接口的存在主要是为了参与一个要求所有参与者都实现 Serializable 接口的框架,那么类或接口实现或扩展 Serializable 可能是有意义的。

Classes designed for inheritance that do implement Serializable include Throwable and Component. Throwable implements Serializable so RMI can send exceptions from server to client. Component implements Serializable so GUIs can be sent, saved, and restored, but even in the heyday of Swing and AWT, this facility was little-used in practice.

在为了继承而设计的类中,Throwable 类和 Component 类都实现了 Serializable 接口。正是因为 Throwable 实现了 Serializable 接口,RMI 可以将异常从服务器发送到客户端;Component 类实现了 Serializable 接口,因此可以发送、保存和恢复 GUI,但即使在 Swing 和 AWT 的鼎盛时期,这个工具在实践中也很少使用。

If you implement a class with instance fields that is both serializable and extendable, there are several risks to be aware of. If there are any invariants on the instance field values, it is critical to prevent subclasses from overriding the finalize method, which the class can do by overriding finalize and declaring it final. Otherwise, the class will be susceptible to finalizer attacks (Item 8). Finally, if the class has invariants that would be violated if its instance fields were initialized to their default values (zero for integral types, false for boolean, and null for object reference types), you must add this readObjectNoData method:

如果你实现了一个带有实例字段的类,它同时是可序列化和可扩展的,那么需要注意几个风险。如果实例字段值上有任何不变量,关键是要防止子类覆盖 finalize 方法,可以通过覆盖 finalize 并声明它为 final 来做到。最后,如果类的实例字段初始化为默认值(整数类型为 0,布尔值为 false,对象引用类型为 null),那么必须添加 readObjectNoData 方法:

  1. // readObjectNoData for stateful extendable serializable classes
  2. private void readObjectNoData() throws InvalidObjectException {
  3. throw new InvalidObjectException("Stream data required");
  4. }

This method was added in Java 4 to cover a corner case involving the addition of a serializable superclass to an existing serializable class [Serialization, 3.5].

这个方法是在 Java 4 中添加的,涉及将可序列化超类添加到现有可序列化类 [Serialization, 3.5] 的特殊情况。

There is one caveat regarding the decision not to implement Serializable. If a class designed for inheritance is not serializable, it may require extra effort to write a serializable subclass. Normal deserialization of such a class requires the superclass to have an accessible parameterless constructor [Serialization, 1.10]. If you don’t provide such a constructor, subclasses are forced to use the serialization proxy pattern (Item 90).

关于不实现 Serializable 的决定,有一个警告。如果为继承而设计的类不可序列化,则可能需要额外的工作来编写可序列化的子类。子类的常规反序列化,要求超类具有可访问的无参数构造函数 [Serialization, 1.10]。如果不提供这样的构造函数,子类将被迫使用序列化代理模式(Item-90)。

Inner classes (Item 24) should not implement Serializable. They use compiler-generated synthetic fields to store references to enclosing instances and to store values of local variables from enclosing scopes. How these fields correspond to the class definition is unspecified, as are the names of anonymous and local classes. Therefore, the default serialized form of an inner class is illdefined. A static member class can, however, implement Serializable.

内部类(Item-24)不应该实现 Serializable。 它们使用编译器生成的合成字段存储对外围实例的引用,并存储来自外围的局部变量的值。这些字段与类定义的对应关系,就和没有指定匿名类和局部类的名称一样。因此,内部类的默认序列化形式是不确定的。但是,静态成员类可以实现 Serializable 接口。

To summarize, the ease of implementing Serializable is specious. Unless a class is to be used only in a protected environment where versions will never have to interoperate and servers will never be exposed to untrusted data, implementing Serializable is a serious commitment that should be made with great care. Extra caution is warranted if a class permits inheritance.

总而言之,认为实现 Serializable 接口很简单这个观点似是而非。除非类只在受保护的环境中使用,在这种环境中,版本永远不必互操作,服务器永远不会暴露不可信的数据,否则实现 Serializable 接口是一项严肃的事情,应该非常小心。如果类允许继承,则更加需要格外小心。


【87】考虑使用自定义序列化形式

Consider using a custom serialized form

When you are writing a class under time pressure, it is generally appropriate to concentrate your efforts on designing the best API. Sometimes this means releasing a “throwaway” implementation that you know you’ll replace in a future release. Normally this is not a problem, but if the class implements Serializable and uses the default serialized form, you’ll never be able to escape completely from the throwaway implementation. It will dictate the serialized form forever. This is not just a theoretical problem. It happened to several classes in the Java libraries, including BigInteger.

当你在时间紧迫的情况下编写类时,通常应该将精力集中在设计最佳 API 上。有时,这意味着发布一个「一次性」实现,你也知道在将来的版本中会替换它。通常这不是一个问题,但是如果类实现 Serializable 接口并使用默认的序列化形式,你将永远无法完全摆脱这个「一次性」的实现。它将永远影响序列化的形式。这不仅仅是一个理论问题。这种情况发生在 Java 库中的几个类上,包括 BigInteger。

Do not accept the default serialized form without first considering whether it is appropriate. Accepting the default serialized form should be a conscious decision that this encoding is reasonable from the standpoint of flexibility, performance, and correctness. Generally speaking, you should accept the default serialized form only if it is largely identical to the encoding that you would choose if you were designing a custom serialized form.

在没有考虑默认序列化形式是否合适之前,不要接受它。 接受默认的序列化形式应该是一个三思而后行的决定,即从灵活性、性能和正确性的角度综合来看,这种编码是合理的。一般来说,设计自定义序列化形式时,只有与默认序列化形式所选择的编码在很大程度上相同时,才应该接受默认的序列化形式。

The default serialized form of an object is a reasonably efficient encoding of the physical representation of the object graph rooted at the object. In other words, it describes the data contained in the object and in every object that is reachable from this object. It also describes the topology by which all of these objects are interlinked. The ideal serialized form of an object contains only the logical data represented by the object. It is independent of the physical representation.

对象的默认序列化形式,相对于它的物理表示法而言是一种比较有效的编码形式。换句话说,它描述了对象中包含的数据以及从该对象可以访问的每个对象的数据。它还描述了所有这些对象相互关联的拓扑结构。理想的对象序列化形式只包含对象所表示的逻辑数据。它独立于物理表征。

The default serialized form is likely to be appropriate if an object’s physical representation is identical to its logical content. For example, the default serialized form would be reasonable for the following class, which simplistically represents a person’s name:

如果对象的物理表示与其逻辑内容相同,则默认的序列化形式可能是合适的。 例如,默认的序列化形式对于下面的类来说是合理的,它简单地表示一个人的名字:

  1. // Good candidate for default serialized form
  2. public class Name implements Serializable {
  3. /**
  4. * Last name. Must be non-null.
  5. * @serial
  6. */
  7. private final String lastName;
  8. /**
  9. * First name. Must be non-null.
  10. * @serial
  11. */
  12. private final String firstName;
  13. /**
  14. * Middle name, or null if there is none.
  15. * @serial
  16. */
  17. private final String middleName;
  18. ... // Remainder omitted
  19. }

Logically speaking, a name consists of three strings that represent a last name, a first name, and a middle name. The instance fields in Name precisely mirror this logical content.

从逻辑上讲,名字由三个字符串组成,分别表示姓、名和中间名。Name 的实例字段精确地反映了这个逻辑内容。

Even if you decide that the default serialized form is appropriate, you often must provide a readObject method to ensure invariants and security. In the case of Name, the readObject method must ensure that the fields lastName and firstName are non-null. This issue is discussed at length in Items 88 and 90.

即使你认为默认的序列化形式是合适的,你通常也必须提供 readObject 方法来确保不变性和安全性。 对于 Name 类而言, readObject 方法必须确保字段 lastName 和 firstName 是非空的。Item-88 和 Item-90 详细讨论了这个问题。

Note that there are documentation comments on the lastName, firstName, and middleName fields, even though they are private. That is because these private fields define a public API, which is the serialized form of the class, and this public API must be documented. The presence of the @serial tag tells Javadoc to place this documentation on a special page that documents serialized forms.

注意,虽然 lastName、firstName 和 middleName 字段是私有的,但是它们都有文档注释。这是因为这些私有字段定义了一个公共 API,它是类的序列化形式,并且必须对这个公共 API 进行文档化。@serial 标记的存在告诉 Javadoc 将此文档放在一个特殊的页面上,该页面记录序列化的形式。

Near the opposite end of the spectrum from Name, consider the following class, which represents a list of strings (ignoring for the moment that you would probably be better off using one of the standard List implementations):

与 Name 类不同,考虑下面的类,它是另一个极端。它表示一个字符串列表(使用标准 List 实现可能更好,但此时暂不这么做):

  1. // Awful candidate for default serialized form
  2. public final class StringList implements Serializable {
  3. private int size = 0;
  4. private Entry head = null;
  5. private static class Entry implements Serializable {
  6. String data;
  7. Entry next;
  8. Entry previous;
  9. }
  10. ... // Remainder omitted
  11. }

Logically speaking, this class represents a sequence of strings. Physically, it represents the sequence as a doubly linked list. If you accept the default serialized form, the serialized form will painstakingly mirror every entry in the linked list and all the links between the entries, in both directions.

从逻辑上讲,这个类表示字符串序列。在物理上,它将序列表示为双向链表。如果接受默认的序列化形式,该序列化形式将不遗余力地镜像出链表中的所有项,以及这些项之间的所有双向链接。

Using the default serialized form when an object’s physical representation differs substantially from its logical data content has four disadvantages:

当对象的物理表示与其逻辑数据内容有很大差异时,使用默认的序列化形式有四个缺点:

  • It permanently ties the exported API to the current internal representation. In the above example, the private StringList.Entry class becomes part of the public API. If the representation is changed in a future release, the StringList class will still need to accept the linked list representation on input and generate it on output. The class will never be rid of all the code dealing with linked list entries, even if it doesn’t use them anymore.

它将导出的 API 永久地绑定到当前的内部实现。 在上面的例子中,私有 StringList.Entry 类成为公共 API 的一部分。如果在将来的版本中更改了实现,StringList 类仍然需要接受链表形式的输出,并产生链表形式的输出。这个类永远也摆脱不掉处理链表项所需要的所有代码,即使不再使用链表作为内部数据结构。

  • It can consume excessive space. In the above example, the serialized form unnecessarily represents each entry in the linked list and all the links. These entries and links are mere implementation details, not worthy of inclusion in the serialized form. Because the serialized form is excessively large, writing it to disk or sending it across the network will be excessively slow.

它会占用过多的空间。 在上面的示例中,序列化的形式不必要地表示链表中的每个条目和所有链接关系。这些链表项以及链接只不过是实现细节,不值得记录在序列化形式中。因为这样的序列化形式过于庞大,将其写入磁盘或通过网络发送将非常慢。

  • It can consume excessive time. The serialization logic has no knowledge of the topology of the object graph, so it must go through an expensive graph traversal. In the example above, it would be sufficient simply to follow the next references.

它会消耗过多的时间。 序列化逻辑不知道对象图的拓扑结构,因此必须遍历开销很大的图。在上面的例子中,只要遵循 next 的引用就足够了。

  • It can cause stack overflows. The default serialization procedure performs a recursive traversal of the object graph, which can cause stack overflows even for moderately sized object graphs. Serializing a StringList instance with 1,000–1,800 elements generates a StackOverflowError on my machine. Surprisingly, the minimum list size for which serialization causes a stack overflow varies from run to run (on my machine). The minimum list size that exhibits this problem may depend on the platform implementation and command-line flags; some implementations may not have this problem at all.

它可能导致堆栈溢出。 默认的序列化过程执行对象图的递归遍历,即使对于中等规模的对象图,这也可能导致堆栈溢出。用 1000-1800 个元素序列化 StringList 实例会在我的机器上生成一个 StackOverflowError。令人惊讶的是,序列化导致堆栈溢出的最小列表大小因运行而异(在我的机器上)。显示此问题的最小列表大小可能取决于平台实现和命令行标志;有些实现可能根本没有这个问题。

A reasonable serialized form for StringList is simply the number of strings in the list, followed by the strings themselves. This constitutes the logical data represented by a StringList, stripped of the details of its physical representation. Here is a revised version of StringList with writeObject and readObject methods that implement this serialized form. As a reminder, the transient modifier indicates that an instance field is to be omitted from a class’s default serialized form:

StringList 的合理序列化形式就是列表中的字符串数量,然后是字符串本身。这构成了由 StringList 表示的逻辑数据,去掉了其物理表示的细节。下面是修改后的 StringList 版本,带有实现此序列化形式的 writeObject 和 readObject 方法。提醒一下,transient 修饰符表示要从类的默认序列化表单中省略该实例字段:

  1. // StringList with a reasonable custom serialized form
  2. public final class StringList implements Serializable {
  3. private transient int size = 0;
  4. private transient Entry head = null;
  5. // No longer Serializable!
  6. private static class Entry {
  7. String data;
  8. Entry next;
  9. Entry previous;
  10. }
  11. // Appends the specified string to the list
  12. public final void add(String s) { ... }
  13. /**
  14. * Serialize this {@code StringList} instance.
  15. **
  16. @serialData The size of the list (the number of strings
  17. * it contains) is emitted ({@code int}), followed by all of
  18. * its elements (each a {@code String}), in the proper
  19. * sequence.
  20. */
  21. private void writeObject(ObjectOutputStream s) throws IOException {
  22. s.defaultWriteObject();
  23. s.writeInt(size);
  24. // Write out all elements in the proper order.
  25. for (Entry e = head; e != null; e = e.next)
  26. s.writeObject(e.data);
  27. }
  28. private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException {
  29. s.defaultReadObject();
  30. int numElements = s.readInt();
  31. // Read in all elements and insert them in list
  32. for (int i = 0; i < numElements; i++)
  33. add((String) s.readObject());
  34. }
  35. ... // Remainder omitted
  36. }

The first thing writeObject does is to invoke defaultWriteObject, and the first thing readObject does is to invoke defaultReadObject, even though all of StringList’s fields are transient. You may hear it said that if all of a class’s instance fields are transient, you can dispense with invoking defaultWriteObject and defaultReadObject, but the serialization specification requires you to invoke them regardless. The presence of these calls makes it possible to add nontransient instance fields in a later release while preserving backward and forward compatibility. If an instance is serialized in a later version and deserialized in an earlier version, the added fields will be ignored. Had the earlier version’s readObject method failed to invoke defaultReadObject, the deserialization would fail with a StreamCorruptedException.

writeObject 做的第一件事是调用 defaultWriteObject, readObject 做的第一件事是调用 defaultReadObject,即使 StringList 的所有字段都是 transient 的。你可能听说过,如果一个类的所有实例字段都是 transient 的,那么你可以不调用 defaultWriteObject 和 defaultReadObject,但是序列化规范要求你无论如何都要调用它们。这些调用的存在使得在以后的版本中添加非瞬态实例字段成为可能,同时保留了向后和向前兼容性。如果实例在较晚的版本中序列化,在较早的版本中反序列化,则会忽略添加的字段。如果早期版本的 readObject 方法调用 defaultReadObject 失败,反序列化将失败,并出现 StreamCorruptedException。

Note that there is a documentation comment on the writeObject method, even though it is private. This is analogous to the documentation comment on the private fields in the Name class. This private method defines a public API, which is the serialized form, and that public API should be documented. Like the @serial tag for fields, the @serialData tag for methods tells the Javadoc utility to place this documentation on the serialized forms page.

注意,writeObject 方法有一个文档注释,即使它是私有的。这类似于 Name 类中私有字段的文档注释。这个私有方法定义了一个公共 API,它是序列化的形式,并且应该对该公共 API 进行文档化。与字段的 @serial 标记一样,方法的 @serialData 标记告诉 Javadoc 实用工具将此文档放在序列化形式页面上。

To lend some sense of scale to the earlier performance discussion, if the average string length is ten characters, the serialized form of the revised version of StringList occupies about half as much space as the serialized form of the original. On my machine, serializing the revised version of StringList is over twice as fast as serializing the original version, with a list length of ten. Finally, there is no stack overflow problem in the revised form and hence no practical upper limit to the size of StringList that can be serialized.

为了给前面的性能讨论提供一定的伸缩性,如果平均字符串长度是 10 个字符,那么经过修改的 StringList 的序列化形式占用的空间大约是原始字符串序列化形式的一半。在我的机器上,序列化修订后的 StringList 的速度是序列化原始版本的两倍多,列表长度为 10。最后,在修改后的形式中没有堆栈溢出问题,因此对于可序列化的 StringList 的大小没有实际的上限。

While the default serialized form would be bad for StringList, there are classes for which it would be far worse. For StringList, the default serialized form is inflexible and performs badly, but it is correct in the sense that serializing and deserializing a StringList instance yields a faithful copy of the original object with all of its invariants intact. This is not the case for any object whose invariants are tied to implementation-specific details.

虽然默认的序列化形式对 StringList 不好,但是对于某些类来说,情况会更糟。对于 StringList,默认的序列化形式是不灵活的,并且执行得很糟糕,但是它是正确的,因为序列化和反序列化 StringList 实例会生成原始对象的无差错副本,而所有不变量都是完整的。对于任何不变量绑定到特定于实现的细节的对象,情况并非如此。

For example, consider the case of a hash table. The physical representation is a sequence of hash buckets containing key-value entries. The bucket that an entry resides in is a function of the hash code of its key, which is not, in general, guaranteed to be the same from implementation to implementation. In fact, it isn’t even guaranteed to be the same from run to run. Therefore, accepting the default serialized form for a hash table would constitute a serious bug. Serializing and deserializing the hash table could yield an object whose invariants were seriously corrupt.

例如,考虑哈希表的情况。物理表示是包含「键-值」项的哈希桶序列。一个项所在的桶是其键的散列代码的函数,通常情况下,不能保证从一个实现到另一个实现是相同的。事实上,它甚至不能保证每次运行都是相同的。因此,接受哈希表的默认序列化形式将构成严重的 bug。对哈希表进行序列化和反序列化可能会产生一个不变量严重损坏的对象。

Whether or not you accept the default serialized form, every instance field that isn’t labeled transient will be serialized when the defaultWriteObject method is invoked. Therefore, every instance field that can be declared transient should be. This includes derived fields, whose values can be computed from primary data fields, such as a cached hash value. It also includes fields whose values are tied to one particular run of the JVM, such as a long field representing a pointer to a native data structure. Before deciding to make a field nontransient, convince yourself that its value is part of the logical state of the object. If you use a custom serialized form, most or all of the instance fields should be labeled transient, as in the StringList example above.

无论你是否接受默认的序列化形式,当调用 defaultWriteObject 方法时,没有标记为 transient 的每个实例字段都会被序列化。因此,可以声明为 transient 的每个实例字段都应该做这个声明。这包括派生字段,其值可以从主数据字段(如缓存的哈希值)计算。它还包括一些字段,这些字段的值与 JVM 的一个特定运行相关联,比如表示指向本机数据结构指针的 long 字段。在决定使字段非 transient 之前,请确信它的值是对象逻辑状态的一部分。 如果使用自定义序列化表单,大多数或所有实例字段都应该标记为 transient,如上面的 StringList 示例所示。

If you are using the default serialized form and you have labeled one or more fields transient, remember that these fields will be initialized to their default values when an instance is deserialized: null for object reference fields, zero for numeric primitive fields, and false for boolean fields [JLS, 4.12.5]. If these values are unacceptable for any transient fields, you must provide a readObject method that invokes the defaultReadObject method and then restores transient fields to acceptable values (Item 88). Alternatively, these fields can be lazily initialized the first time they are used (Item 83).

如果使用默认的序列化形式,并且标记了一个或多个字段为 transient,请记住,当反序列化实例时,这些字段将初始化为默认值:对象引用字段为 null,数字基本类型字段为 0,布尔字段为 false [JLS, 4.12.5]。如果这些值对于任何 transient 字段都是不可接受的,则必须提供一个 readObject 方法,该方法调用 defaultReadObject 方法,然后将 transient 字段恢复为可接受的值(Item-88)。或者,可以采用延迟初始化(Item-83),在第一次使用这些字段时初始化它们。

Whether or not you use the default serialized form, you must impose any synchronization on object serialization that you would impose on any other method that reads the entire state of the object. So, for example, if you have a thread-safe object (Item 82) that achieves its thread safety by synchronizing every method and you elect to use the default serialized form, use the following write-Object method:

无论你是否使用默认的序列化形式,必须对对象序列化强制执行任何同步操作,就像对读取对象的整个状态的任何其他方法强制执行的那样。 例如,如果你有一个线程安全的对象(Item-82),它通过同步每个方法来实现线程安全,并且你选择使用默认的序列化形式,那么使用以下 write-Object 方法:

  1. // writeObject for synchronized class with default serialized form
  2. private synchronized void writeObject(ObjectOutputStream s) throws IOException {
  3. s.defaultWriteObject();
  4. }

If you put synchronization in the writeObject method, you must ensure that it adheres to the same lock-ordering constraints as other activities, or you risk a resource-ordering deadlock [Goetz06, 10.1.5].

如果将同步放在 writeObject 方法中,则必须确保它遵守与其他活动相同的锁排序约束,否则将面临资源排序死锁的风险 [Goetz06, 10.1.5]。

Regardless of what serialized form you choose, declare an explicit serial version UID in every serializable class you write. This eliminates the serial version UID as a potential source of incompatibility (Item 86). There is also a small performance benefit. If no serial version UID is provided, an expensive computation is performed to generate one at runtime.

无论选择哪种序列化形式,都要在编写的每个可序列化类中声明显式的序列版本 UID。 这消除了序列版本 UID 成为不兼容性的潜在来源(Item-86)。这么做还能获得一个小的性能优势。如果没有提供序列版本 UID,则需要执行高开销的计算在运行时生成一个 UID。

Declaring a serial version UID is simple. Just add this line to your class:

声明序列版本 UID 很简单,只要在你的类中增加这一行:

  1. private static final long serialVersionUID = randomLongValue;

If you write a new class, it doesn’t matter what value you choose for randomLongValue. You can generate the value by running the serialver utility on the class, but it’s also fine to pick a number out of thin air. It is not required that serial version UIDs be unique. If you modify an existing class that lacks a serial version UID, and you want the new version to accept existing serialized instances, you must use the value that was automatically generated for the old version. You can get this number by running the serialver utility on the old version of the class—the one for which serialized instances exist.

如果你编写一个新类,为 randomLongValue 选择什么值并不重要。你可以通过在类上运行 serialver 实用工具来生成该值,但是也可以凭空选择一个数字。串行版本 UID 不需要是唯一的。如果修改缺少串行版本 UID 的现有类,并且希望新版本接受现有的序列化实例,则必须使用为旧版本自动生成的值。你可以通过在类的旧版本上运行 serialver 实用工具(序列化实例存在于旧版本上)来获得这个数字。

If you ever want to make a new version of a class that is incompatible with existing versions, merely change the value in the serial version UID declaration. This will cause attempts to deserialize serialized instances of previous versions to throw an InvalidClassException. Do not change the serial version UID unless you want to break compatibility with all existing serialized instances of a class.

如果你希望创建一个新版本的类,它与现有版本不兼容,如果更改序列版本 UID 声明中的值,这将导致反序列化旧版本的序列化实例的操作引发 InvalidClassException。不要更改序列版本 UID,除非你想破坏与现有序列化所有实例的兼容性。

To summarize, if you have decided that a class should be serializable (Item 86), think hard about what the serialized form should be. Use the default serialized form only if it is a reasonable description of the logical state of the object; otherwise design a custom serialized form that aptly describes the object. You should allocate as much time to designing the serialized form of a class as you allocate to designing an exported method (Item 51). Just as you can’t eliminate exported methods from future versions, you can’t eliminate fields from the serialized form; they must be preserved forever to ensure serialization compatibility. Choosing the wrong serialized form can have a permanent, negative impact on the complexity and performance of a class.

总而言之,如果你已经决定一个类应该是可序列化的(Item-86),那么请仔细考虑一下序列化的形式应该是什么。只有在合理描述对象的逻辑状态时,才使用默认的序列化形式;否则,设计一个适合描述对象的自定义序列化形式。设计类的序列化形式应该和设计导出方法花的时间应该一样多,都应该严谨对待(Item-51)。正如不能从未来版本中删除导出的方法一样,也不能从序列化形式中删除字段;必须永远保存它们,以确保序列化兼容性。选择错误的序列化形式可能会对类的复杂性和性能产生永久性的负面影响。


【88】防御性地编写 readObject 方法

Write readObject methods defensively

Item 50 contains an immutable date-range class with mutable private Date fields. The class goes to great lengths to preserve its invariants and immutability by defensively copying Date objects in its constructor and accessors. Here is the class:

Item-50 包含一个具有可变私有 Date 字段的不可变日期范围类。该类通过在构造函数和访问器中防御性地复制 Date 对象,不遗余力地保持其不变性和不可变性。它是这样的:

  1. // Immutable class that uses defensive copying
  2. public final class Period {
  3. private final Date start;
  4. private final Date end;
  5. /**
  6. * @param start the beginning of the period
  7. * @param end the end of the period; must not precede start
  8. * @throws IllegalArgumentException if start is after end
  9. * @throws NullPointerException if start or end is null
  10. */
  11. public Period(Date start, Date end) {
  12. this.start = new Date(start.getTime());
  13. this.end = new Date(end.getTime());
  14. if (this.start.compareTo(this.end) > 0)
  15. throw new IllegalArgumentException(start + " after " + end);
  16. }
  17. public Date start () { return new Date(start.getTime()); }
  18. public Date end () { return new Date(end.getTime()); }
  19. public String toString() { return start + " - " + end; }
  20. ... // Remainder omitted
  21. }

Suppose you decide that you want this class to be serializable. Because the physical representation of a Period object exactly mirrors its logical data content, it is not unreasonable to use the default serialized form (Item 87). Therefore, it might seem that all you have to do to make the class serializable is to add the words implements Serializable to the class declaration. If you did so, however, the class would no longer guarantee its critical invariants.

假设你决定让这个类可序列化。由于 Period 对象的物理表示精确地反映了它的逻辑数据内容,所以使用默认的序列化形式是合理的(Item-87)。因此,要使类可序列化,似乎只需将实现 Serializable 接口。但是,如果这样做,该类将不再保证它的临界不变量。

The problem is that the readObject method is effectively another public constructor, and it demands all of the same care as any other constructor. Just as a constructor must check its arguments for validity (Item 49) and make defensive copies of parameters where appropriate (Item 50), so must a readObject method. If a readObject method fails to do either of these things, it is a relatively simple matter for an attacker to violate the class’s invariants.

问题是 readObject 方法实际上是另一个公共构造函数,它与任何其他构造函数有相同的注意事项。如,构造函数必须检查其参数的有效性(Item-49)并在适当的地方制作防御性副本(Item-50)一样,readObject 方法也必须这样做。如果 readObject 方法没有做到这两件事中的任何一件,那么攻击者就很容易违反类的不变性。

Loosely speaking, readObject is a constructor that takes a byte stream as its sole parameter. In normal use, the byte stream is generated by serializing a normally constructed instance. The problem arises when readObject is presented with a byte stream that is artificially constructed to generate an object that violates the invariants of its class. Such a byte stream can be used to create an impossible object, which could not have been created using a normal constructor.

不严格地说,readObject 是一个构造函数,它唯一的参数是字节流。在正常使用中,字节流是通过序列化一个正常构造的实例生成的。当 readObject 呈现一个字节流时,问题就出现了,这个字节流是人为构造的,用来生成一个违反类不变性的对象。这样的字节流可用于创建一个不可思议的对象,而该对象不能使用普通构造函数创建。

Assume that we simply added implements Serializable to the class declaration for Period. This ugly program would then generate a Period instance whose end precedes its start. The casts on byte values whose highorder bit is set is a consequence of Java’s lack of byte literals combined with the unfortunate decision to make the byte type signed:

假设我们只是简单地让 Period 实现 Serializable 接口。然后,这个有问题的程序将生成一个 Period 实例,其结束比起始时间还要早。对其高位位设置的字节值进行强制转换,这是由于 Java 缺少字节字面值,再加上让字节类型签名的错误决定导致的:

  1. public class BogusPeriod {
  2. // Byte stream couldn't have come from a real Period instance!
  3. private static final byte[] serializedForm = {
  4. (byte)0xac, (byte)0xed, 0x00, 0x05, 0x73, 0x72, 0x00, 0x06,
  5. 0x50, 0x65, 0x72, 0x69, 0x6f, 0x64, 0x40, 0x7e, (byte)0xf8,
  6. 0x2b, 0x4f, 0x46, (byte)0xc0, (byte)0xf4, 0x02, 0x00, 0x02,
  7. 0x4c, 0x00, 0x03, 0x65, 0x6e, 0x64, 0x74, 0x00, 0x10, 0x4c,
  8. 0x6a, 0x61, 0x76, 0x61, 0x2f, 0x75, 0x74, 0x69, 0x6c, 0x2f,
  9. 0x44, 0x61, 0x74, 0x65, 0x3b, 0x4c, 0x00, 0x05, 0x73, 0x74,
  10. 0x61, 0x72, 0x74, 0x71, 0x00, 0x7e, 0x00, 0x01, 0x78, 0x70,
  11. 0x73, 0x72, 0x00, 0x0e, 0x6a, 0x61, 0x76, 0x61, 0x2e, 0x75,
  12. 0x74, 0x69, 0x6c, 0x2e, 0x44, 0x61, 0x74, 0x65, 0x68, 0x6a,
  13. (byte)0x81, 0x01, 0x4b, 0x59, 0x74, 0x19, 0x03, 0x00, 0x00,
  14. 0x78, 0x70, 0x77, 0x08, 0x00, 0x00, 0x00, 0x66, (byte)0xdf,
  15. 0x6e, 0x1e, 0x00, 0x78, 0x73, 0x71, 0x00, 0x7e, 0x00, 0x03,
  16. 0x77, 0x08, 0x00, 0x00, 0x00, (byte)0xd5, 0x17, 0x69, 0x22,
  17. 0x00, 0x78
  18. };
  19. public static void main(String[] args) {
  20. Period p = (Period) deserialize(serializedForm);
  21. System.out.println(p);
  22. }
  23. // Returns the object with the specified serialized form
  24. static Object deserialize(byte[] sf) {
  25. try {
  26. return new ObjectInputStream(new ByteArrayInputStream(sf)).readObject();
  27. } catch (IOException | ClassNotFoundException e) {
  28. throw new IllegalArgumentException(e);
  29. }
  30. }
  31. }

The byte array literal used to initialize serializedForm was generated by serializing a normal Period instance and hand-editing the resulting byte stream. The details of the stream are unimportant to the example, but if you’re curious, the serialization byte-stream format is described in the Java Object Serialization Specification [Serialization, 6]. If you run this program, it prints Fri Jan 01 12:00:00 PST 1999 - Sun Jan 01 12:00:00 PST 1984. Simply declaring Period serializable enabled us to create an object that violates its class invariants.

用于初始化 serializedForm 的字节数组文本是通过序列化一个普通 Period 实例并手工编辑得到的字节流生成的。流的细节对示例并不重要,但是如果你感兴趣,可以在《JavaTM Object Serialization Specification》[serialization, 6]中查到序列化字节流的格式描述。如果你运行这个程序,它将打印 Fri Jan 01 12:00:00 PST 1999 - Sun Jan 01 12:00:00 PST 1984。只需声明 Period 可序列化,就可以创建一个违反其类不变性的对象。

To fix this problem, provide a readObject method for Period that calls defaultReadObject and then checks the validity of the deserialized object. If the validity check fails, the readObject method throws InvalidObjectException, preventing the deserialization from completing:

要解决此问题,请为 Period 提供一个 readObject 方法,该方法调用 defaultReadObject,然后检查反序列化对象的有效性。如果有效性检查失败,readObject 方法抛出 InvalidObjectException,阻止反序列化完成:

  1. // readObject method with validity checking - insufficient!
  2. private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException {
  3. s.defaultReadObject();
  4. // Check that our invariants are satisfied
  5. if (start.compareTo(end) > 0)
  6. throw new InvalidObjectException(start +" after "+ end);
  7. }

While this prevents an attacker from creating an invalid Period instance, there is a more subtle problem still lurking. It is possible to create a mutable Period instance by fabricating a byte stream that begins with a valid Period instance and then appends extra references to the private Date fields internal to the Period instance. The attacker reads the Period instance from the ObjectInputStream and then reads the “rogue object references” that were appended to the stream. These references give the attacker access to the objects referenced by the private Date fields within the Period object. By mutating these Date instances, the attacker can mutate the Period instance. The following class demonstrates this attack:

虽然这可以防止攻击者创建无效的 Period 实例,但还有一个更微妙的问题仍然潜伏着。可以通过字节流来创建一个可变的 Period 实例,该字节流以一个有效的 Period 实例开始,然后向 Period 实例内部的私有日期字段追加额外的引用。攻击者从 ObjectInputStream 中读取 Period 实例,然后读取附加到流中的「恶意对象引用」。这些引用使攻击者能够访问 Period 对象中的私有日期字段引用的对象。通过修改这些日期实例,攻击者可以修改 Period 实例。下面的类演示了这种攻击:

  1. public class MutablePeriod {
  2. // A period instance
  3. public final Period period;
  4. // period's start field, to which we shouldn't have access
  5. public final Date start;
  6. // period's end field, to which we shouldn't have access
  7. public final Date end;
  8. public MutablePeriod() {
  9. try {
  10. ByteArrayOutputStream bos = new ByteArrayOutputStream();
  11. ObjectOutputStream out = new ObjectOutputStream(bos);
  12. // Serialize a valid Period instance
  13. out.writeObject(new Period(new Date(), new Date()));
  14. /*
  15. * Append rogue "previous object refs" for internal
  16. * Date fields in Period. For details, see "Java
  17. * Object Serialization Specification," Section 6.4.
  18. */
  19. byte[] ref = { 0x71, 0, 0x7e, 0, 5 }; // Ref #5
  20. bos.write(ref); // The start field
  21. ref[4] = 4; // Ref # 4
  22. bos.write(ref); // The end field
  23. // Deserialize Period and "stolen" Date references
  24. ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(bos.toByteArray()));
  25. period = (Period) in.readObject();
  26. start = (Date) in.readObject();
  27. end = (Date) in.readObject();
  28. } catch (IOException | ClassNotFoundException e) {
  29. throw new AssertionError(e);
  30. }
  31. }
  32. }

To see the attack in action, run the following program:

要查看攻击的实际效果,请运行以下程序:

  1. public static void main(String[] args) {
  2. MutablePeriod mp = new MutablePeriod();
  3. Period p = mp.period;
  4. Date pEnd = mp.end;
  5. // Let's turn back the clock
  6. pEnd.setYear(78);
  7. System.out.println(p);
  8. // Bring back the 60s!
  9. pEnd.setYear(69);
  10. System.out.println(p);
  11. }

In my locale, running this program produces the following output:

在我的语言环境中,运行这个程序会产生以下输出:

  1. Wed Nov 22 00:21:29 PST 2017 - Wed Nov 22 00:21:29 PST 1978
  2. Wed Nov 22 00:21:29 PST 2017 - Sat Nov 22 00:21:29 PST 1969

While the Period instance is created with its invariants intact, it is possible to modify its internal components at will. Once in possession of a mutable Period instance, an attacker might cause great harm by passing the instance to a class that depends on Period’s immutability for its security. This is not so farfetched: there are classes that depend on String’s immutability for their security.

虽然创建 Period 实例时保留了它的不变性,但是可以随意修改它的内部组件。一旦拥有一个可变的 Period 实例,攻击者可能会将实例传递给一个依赖于 Period 的不变性来保证其安全性的类,从而造成极大的危害。这并不是牵强附会的:有些类依赖于 String 的不变性来保证其安全。

The source of the problem is that Period’s readObject method is not doing enough defensive copying. When an object is deserialized, it is critical to defensively copy any field containing an object reference that a client must not possess. Therefore, every serializable immutable class containing private mutable components must defensively copy these components in its readObject method. The following readObject method suffices to ensure Period’s invariants and to maintain its immutability:

问题的根源在于 Period 的 readObject 方法没有进行足够的防御性复制。当对象被反序列化时,对任何客户端不能拥有的对象引用的字段进行防御性地复制至关重要。 因此,对于每个可序列化的不可变类,如果它包含了私有的可变组件,那么在它的 readObjec 方法中,必须要对这些组件进行防御性地复制。下面的 readObject 方法足以保证周期的不变性,并保持其不变性:

  1. // readObject method with defensive copying and validity checking
  2. private void readObject(ObjectInputStream s) throws IOException, ClassNotFoundException {
  3. s.defaultReadObject();
  4. // Defensively copy our mutable components
  5. start = new Date(start.getTime());
  6. end = new Date(end.getTime());
  7. // Check that our invariants are satisfied
  8. if (start.compareTo(end) > 0)
  9. throw new InvalidObjectException(start +" after "+ end);
  10. }

Note that the defensive copy is performed prior to the validity check and that we did not use Date’s clone method to perform the defensive copy. Both of these details are required to protect Period against attack (Item 50). Note also that defensive copying is not possible for final fields. To use the readObject method, we must make the start and end fields nonfinal. This is unfortunate, but it is the lesser of two evils. With the new readObject method in place and the final modifier removed from the start and end fields, the MutablePeriod class is rendered ineffective. The above attack program now generates this output:

注意,防御副本是在有效性检查之前执行的,我们没有使用 Date 的 clone 方法来执行防御副本。这两个细节对于保护 Period 免受攻击是必要的(第 50 项)。还要注意,防御性复制不可能用于 final 字段。要使用 readObject 方法,必须使 start 和 end 字段非 final。这是不幸的,但却是权衡利弊后的方案。使用新的 readObject 方法,并从 start 和 end 字段中删除 final 修饰符,MutablePeriod 类将无效。上面的攻击程序现在生成这个输出:

  1. Wed Nov 22 00:23:41 PST 2017 - Wed Nov 22 00:23:41 PST 2017
  2. Wed Nov 22 00:23:41 PST 2017 - Wed Nov 22 00:23:41 PST 2017

Here is a simple litmus test for deciding whether the default readObject method is acceptable for a class: would you feel comfortable adding a public constructor that took as parameters the values for each nontransient field in the object and stored the values in the fields with no validation whatsoever? If not, you must provide a readObject method, and it must perform all the validity checking and defensive copying that would be required of a constructor. Alternatively, you can use the serialization proxy pattern (Item 90). This pattern is highly recommended because it takes much of the effort out of safe deserialization.

下面是一个简单的测试,用于判断默认 readObject 方法是否可用于类:你是否愿意添加一个公共构造函数,该构造函数将对象中每个非 transient 字段的值作为参数,并在没有任何验证的情况下将值存储在字段中?如果没有,则必须提供 readObject 方法,并且它必须执行构造函数所需的所有有效性检查和防御性复制。或者,你可以使用序列化代理模式(Item-90)。强烈推荐使用这种模式,否则会在安全反序列化方面花费大量精力。

There is one other similarity between readObject methods and constructors that applies to nonfinal serializable classes. Like a constructor, a readObject method must not invoke an overridable method, either directly or indirectly (Item 19). If this rule is violated and the method in question is overridden, the overriding method will run before the subclass’s state has been deserialized. A program failure is likely to result [Bloch05, Puzzle 91].

readObject 方法和构造函数之间还有一个相似之处,适用于非 final 序列化类。与构造函数一样,readObject 方法不能直接或间接调用可覆盖的方法(Item-19)。如果违反了这条规则,并且涉及的方法被覆盖,则覆盖方法将在子类的状态反序列化之前运行。很可能导致程序失败 [Bloch05, Puzzle 91]。

To summarize, anytime you write a readObject method, adopt the mindset that you are writing a public constructor that must produce a valid instance regardless of what byte stream it is given. Do not assume that the byte stream represents an actual serialized instance. While the examples in this item concern a class that uses the default serialized form, all of the issues that were raised apply equally to classes with custom serialized forms. Here, in summary form, are the guidelines for writing a readObject method:

总而言之,无论何时编写 readObject 方法,都要采用这样的思维方式,即编写一个公共构造函数,该构造函数必须生成一个有效的实例,而不管给定的是什么字节流。不要假设字节流表示实际的序列化实例。虽然本条目中的示例涉及使用默认序列化形式的类,但是所引发的所有问题都同样适用于具有自定义序列化形式的类。下面是编写 readObject 方法的指导原则:

  • For classes with object reference fields that must remain private, defensively copy each object in such a field. Mutable components of immutable classes fall into this category.

对象引用字段必须保持私有的的类,应防御性地复制该字段中的每个对象。不可变类的可变组件属于这一类。

  • Check any invariants and throw an InvalidObjectException if a check fails. The checks should follow any defensive copying.

检查任何不变量,如果检查失败,则抛出 InvalidObjectException。检查动作应该跟在任何防御性复制之后。

  • If an entire object graph must be validated after it is deserialized, use the ObjectInputValidation interface (not discussed in this book).

如果必须在反序列化后验证整个对象图,那么使用 ObjectInputValidation 接口(在本书中没有讨论)。

  • Do not invoke any overridable methods in the class, directly or indirectly.

不要直接或间接地调用类中任何可被覆盖的方法。


【88】对于实例控制,枚举类型优于 readResolve

For instance control, prefer enum types to readResolve

Item 3 describes the Singleton pattern and gives the following example of a singleton class. This class restricts access to its constructor to ensure that only a single instance is ever created:

Item-3 描述了单例模式,并给出了下面的单例类示例。该类限制对其构造函数的访问,以确保只创建一个实例:

  1. public class Elvis {
  2. public static final Elvis INSTANCE = new Elvis();
  3. private Elvis() { ... }
  4. public void leaveTheBuilding() { ... }
  5. }

As noted in Item 3, this class would no longer be a singleton if the words implements Serializable were added to its declaration. It doesn’t matter whether the class uses the default serialized form or a custom serialized form (Item 87), nor does it matter whether the class provides an explicit readObject method (Item 88). Any readObject method, whether explicit or default, returns a newly created instance, which will not be the same instance that was created at class initialization time.

如 Item-3 所述,如果实现 Serializable 接口,该类将不再是单例的。类使用默认序列化形式还是自定义序列化形式并不重要(Item-87),类是否提供显式 readObject 方法也不重要(Item-88)。任何 readObject 方法,不管是显式的还是默认的,都会返回一个新创建的实例,这个实例与类初始化时创建的实例不同。

The readResolve feature allows you to substitute another instance for the one created by readObject [Serialization, 3.7]. If the class of an object being deserialized defines a readResolve method with the proper declaration, this method is invoked on the newly created object after it is deserialized. The object reference returned by this method is then returned in place of the newly created object. In most uses of this feature, no reference to the newly created object is retained, so it immediately becomes eligible for garbage collection.

readResolve 特性允许你用另一个实例替换 readObject[Serialization, 3.7] 创建的实例。如果正在反序列化的对象的类定义了 readResolve 方法,新创建的对象反序列化之后,将在该对象上调用该方法。该方法返回的对象引用将代替新创建的对象返回。在该特性的大多数使用中,不保留对新创建对象的引用,因此它立即就有资格进行垃圾收集。

If the Elvis class is made to implement Serializable, the following readResolve method suffices to guarantee the singleton property:

如果 Elvis 类要实现 Serializable 接口,下面的 readResolve 方法就足以保证其单例属性:

  1. // readResolve for instance control - you can do better!
  2. private Object readResolve() {
  3. // Return the one true Elvis and let the garbage collector
  4. // take care of the Elvis impersonator.
  5. return INSTANCE;
  6. }

This method ignores the deserialized object, returning the distinguished Elvis instance that was created when the class was initialized. Therefore, the serialized form of an Elvis instance need not contain any real data; all instance fields should be declared transient. In fact, if you depend on readResolve for instance control, all instance fields with object reference types must be declared transient. Otherwise, it is possible for a determined attacker to secure a reference to the deserialized object before its readResolve method is run, using a technique that is somewhat similar to the MutablePeriod attack in Item 88.

此方法忽略反序列化对象,返回初始化类时创建的特殊 Elvis 实例。因此,Elvis 实例的序列化形式不需要包含任何实际数据;所有实例字段都应该声明为 transient。事实上,如果你依赖 readResolve 进行实例控制,那么所有具有对象引用类型的实例字段都必须声明为 transient。 否则,有的攻击者有可能在运行反序列化对象的 readResolve 方法之前保护对该对象的引用,使用的技术有点类似于 Item-88 中的 MutablePeriod 攻击。

The attack is a bit complicated, but the underlying idea is simple. If a singleton contains a nontransient object reference field, the contents of this field will be deserialized before the singleton’s readResolve method is run. This allows a carefully crafted stream to “steal” a reference to the originally deserialized singleton at the time the contents of the object reference field are deserialized.

攻击有点复杂,但其基本思想很简单。如果单例包含一个非 transient 对象引用字段,则在运行单例的 readResolve 方法之前,将对该字段的内容进行反序列化。这允许一个精心设计的流在对象引用字段的内容被反序列化时「窃取」对原来反序列化的单例对象的引用。

Here’s how it works in more detail. First, write a “stealer” class that has both a readResolve method and an instance field that refers to the serialized singleton in which the stealer “hides.” In the serialization stream, replace the singleton’s nontransient field with an instance of the stealer. You now have a circularity: the singleton contains the stealer, and the stealer refers to the singleton.

下面是它的工作原理。首先,编写一个 stealer 类,该类具有 readResolve 方法和一个实例字段,该实例字段引用序列化的单例,其中 stealer 「隐藏」在其中。在序列化流中,用一个 stealer 实例替换单例的非 transient 字段。现在你有了一个循环:单例包含了 stealer,而 stealer 引用了单例。

Because the singleton contains the stealer, the stealer’s readResolve method runs first when the singleton is deserialized. As a result, when the stealer’s readResolve method runs, its instance field still refers to the partially deserialized (and as yet unresolved) singleton.

因为单例包含 stealer,所以当反序列化单例时,窃取器的 readResolve 方法首先运行。因此,当 stealer 的 readResolve 方法运行时,它的实例字段仍然引用部分反序列化(且尚未解析)的单例。

The stealer’s readResolve method copies the reference from its instance field into a static field so that the reference can be accessed after the readResolve method runs. The method then returns a value of the correct type for the field in which it’s hiding. If it didn’t do this, the VM would throw a ClassCastException when the serialization system tried to store the stealer reference into this field.

stealer 的 readResolve 方法将引用从其实例字段复制到静态字段,以便在 readResolve 方法运行后访问引用。然后,该方法为其隐藏的字段返回正确类型的值。如果不这样做,当序列化系统试图将 stealer 引用存储到该字段时,VM 将抛出 ClassCastException。

To make this concrete, consider the following broken singleton:

要使问题具体化,请考虑以下被破坏的单例:

  1. // Broken singleton - has nontransient object reference field!
  2. public class Elvis implements Serializable {
  3. public static final Elvis INSTANCE = new Elvis();
  4. private Elvis() { }
  5. private String[] favoriteSongs ={ "Hound Dog", "Heartbreak Hotel" };
  6. public void printFavorites() {
  7. System.out.println(Arrays.toString(favoriteSongs));
  8. }
  9. private Object readResolve() {
  10. return INSTANCE;
  11. }
  12. }

Here is a “stealer” class, constructed as per the description above:

这里是一个 stealer 类,按照上面的描述构造:

  1. public class ElvisStealer implements Serializable {
  2. static Elvis impersonator;
  3. private Elvis payload;
  4. private Object readResolve() {
  5. // Save a reference to the "unresolved" Elvis instance
  6. impersonator = payload;
  7. // Return object of correct type for favoriteSongs field
  8. return new String[] { "A Fool Such as I" };
  9. }
  10. private static final long serialVersionUID =0;
  11. }

Finally, here is an ugly program that deserializes a handcrafted stream to produce two distinct instances of the flawed singleton. The deserialize method is omitted from this program because it’s identical to the one on page 354:

最后,这是一个有问题的程序,它反序列化了一个手工制作的流,以生成有缺陷的单例的两个不同实例。这个程序省略了反序列化方法,因为它与第 354 页的方法相同:

  1. public class ElvisImpersonator {
  2. // Byte stream couldn't have come from a real Elvis instance!
  3. private static final byte[] serializedForm = {
  4. (byte)0xac, (byte)0xed, 0x00, 0x05, 0x73, 0x72, 0x00, 0x05,
  5. 0x45, 0x6c, 0x76, 0x69, 0x73, (byte)0x84, (byte)0xe6,
  6. (byte)0x93, 0x33, (byte)0xc3, (byte)0xf4, (byte)0x8b,
  7. 0x32, 0x02, 0x00, 0x01, 0x4c, 0x00, 0x0d, 0x66, 0x61, 0x76,
  8. 0x6f, 0x72, 0x69, 0x74, 0x65, 0x53, 0x6f, 0x6e, 0x67, 0x73,
  9. 0x74, 0x00, 0x12, 0x4c, 0x6a, 0x61, 0x76, 0x61, 0x2f, 0x6c,
  10. 0x61, 0x6e, 0x67, 0x2f, 0x4f, 0x62, 0x6a, 0x65, 0x63, 0x74,
  11. 0x3b, 0x78, 0x70, 0x73, 0x72, 0x00, 0x0c, 0x45, 0x6c, 0x76,
  12. 0x69, 0x73, 0x53, 0x74, 0x65, 0x61, 0x6c, 0x65, 0x72, 0x00,
  13. 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x01,
  14. 0x4c, 0x00, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64,
  15. 0x74, 0x00, 0x07, 0x4c, 0x45, 0x6c, 0x76, 0x69, 0x73, 0x3b,
  16. 0x78, 0x70, 0x71, 0x00, 0x7e, 0x00, 0x02
  17. };
  18. public static void main(String[] args) {
  19. // Initializes ElvisStealer.impersonator and returns
  20. // the real Elvis (which is Elvis.INSTANCE)
  21. Elvis elvis = (Elvis) deserialize(serializedForm);
  22. Elvis impersonator = ElvisStealer.impersonator;
  23. elvis.printFavorites();
  24. impersonator.printFavorites();
  25. }
  26. }

Running this program produces the following output, conclusively proving that it’s possible to create two distinct Elvis instances (with different tastes in music):

运行此程序将生成以下输出,最终证明可以创建两个不同的 Elvis 实例(具有不同的音乐品味):

  1. [Hound Dog, Heartbreak Hotel]
  2. [A Fool Such as I]

You could fix the problem by declaring the favoriteSongs field transient, but you’re better off fixing it by making Elvis a single-element enum type (Item 3). As demonstrated by the ElvisStealer attack, using a readResolve method to prevent a “temporary” deserialized instance from being accessed by an attacker is fragile and demands great care.

通过将 favorites 字段声明为 transient 可以解决这个问题,但是最好把 Elvis 做成是一个单元素的枚举类型(Item-3)。ElvisStealer 所示的攻击表名,使用 readResolve 方法防止「temporary」反序列化实例被攻击者访问的方式是脆弱的,需要非常小心。

If you write your serializable instance-controlled class as an enum, Java guarantees you that there can be no instances besides the declared constants, unless an attacker abuses a privileged method such as AccessibleObject.setAccessible. Any attacker who can do that already has sufficient privileges to execute arbitrary native code, and all bets are off. Here’s how our Elvis example looks as an enum:

如果你将可序列化的实例控制类编写为枚举类型, Java 保证除了声明的常量之外不会有任何实例,除非攻击者滥用了特权方法,如 AccessibleObject.setAccessible。任何能够做到这一点的攻击者都已经拥有足够的特权来执行任意的本地代码,all bets are off。以下是把 Elvis 写成枚举的例子:

  1. // Enum singleton - the preferred approach
  2. public enum Elvis {
  3. INSTANCE;
  4. private String[] favoriteSongs ={ "Hound Dog", "Heartbreak Hotel" };
  5. public void printFavorites() {
  6. System.out.println(Arrays.toString(favoriteSongs));
  7. }
  8. }

The use of readResolve for instance control is not obsolete. If you have to write a serializable instance-controlled class whose instances are not known at compile time, you will not be able to represent the class as an enum type.

使用 readResolve 进行实例控制并不过时。如果必须编写一个可序列化的实例控制类,而该类的实例在编译时是未知的,则不能将该类表示为枚举类型。

The accessibility of readResolve is significant. If you place a readResolve method on a final class, it should be private. If you place a readResolve method on a nonfinal class, you must carefully consider its accessibility. If it is private, it will not apply to any subclasses. If it is packageprivate, it will apply only to subclasses in the same package. If it is protected or public, it will apply to all subclasses that do not override it. If a readResolve method is protected or public and a subclass does not override it, deserializing a subclass instance will produce a superclass instance, which is likely to cause a ClassCastException.

readResolve 的可访问性非常重要。 如果你将 readResolve 方法放在 final 类上,那么它应该是私有的。如果将 readResolve 方法放在非 final 类上,必须仔细考虑其可访问性。如果它是私有的,它将不应用于任何子类。如果它是包级私有的,它将只适用于同一包中的子类。如果它是受保护的或公共的,它将应用于不覆盖它的所有子类。如果 readResolve 方法是受保护的或公共的,而子类没有覆盖它,反序列化子类实例将生成超类实例,这可能会导致 ClassCastException。

To summarize, use enum types to enforce instance control invariants wherever possible. If this is not possible and you need a class to be both serializable and instance-controlled, you must provide a readResolve method and ensure that all of the class’s instance fields are either primitive or transient.

总之,在可能的情况下,使用枚举类型强制实例控制不变量。如果这是不可能的,并且你需要一个既可序列化又实例控制的类,那么你必须提供一个 readResolve 方法,并确保该类的所有实例字段都是基本类型,或使用 transient 修饰。


【90】考虑以序列化代理代替序列化实例

Consider serialization proxies instead of serialized instances

As mentioned in Items 85 and 86 and discussed throughout this chapter, the decision to implement Serializable increases the likelihood of bugs and security problems as it allows instances to be created using an extralinguistic mechanism in place of ordinary constructors. There is, however, a technique that greatly reduces these risks. This technique is known as the serialization proxy pattern.

正如在 Item-85 和 Item-86 中提到的贯穿本章的问题:实现 Serializable 接口的决定增加了出现 bug 和安全问题的可能性,因为它允许使用一种超语言机制来创建实例,而不是使用普通的构造函数。然而,有一种技术可以大大降低这些风险。这种技术称为序列化代理模式。

The serialization proxy pattern is reasonably straightforward. First, design a private static nested class that concisely represents the logical state of an instance of the enclosing class. This nested class is known as the serialization proxy of the enclosing class. It should have a single constructor, whose parameter type is the enclosing class. This constructor merely copies the data from its argument: it need not do any consistency checking or defensive copying. By design, the default serialized form of the serialization proxy is the perfect serialized form of the enclosing class. Both the enclosing class and its serialization proxy must be declared to implement Serializable.

序列化代理模式相当简单。首先,设计一个私有静态嵌套类,它简洁地表示外围类实例的逻辑状态。这个嵌套类称为外围类的序列化代理。它应该有一个构造函数,其参数类型是外围类。这个构造函数只是从它的参数复制数据:它不需要做任何一致性检查或防御性复制。按照设计,序列化代理的默认序列化形式是外围类的完美序列化形式。外围类及其序列代理都必须声明实现 Serializable 接口。

For example, consider the immutable Period class written in Item 50 and made serializable in Item 88. Here is a serialization proxy for this class. Period is so simple that its serialization proxy has exactly the same fields as the class:

例如,考虑 Item-50 中编写的不可变 Period 类,并在 Item-88 中使其可序列化。这是该类的序列化代理。Period 非常简单,它的序列化代理具有与类完全相同的字段:

  1. // Serialization proxy for Period class
  2. private static class SerializationProxy implements Serializable {
  3. private final Date start;
  4. private final Date end;
  5. SerializationProxy(Period p) {
  6. this.start = p.start;
  7. this.end = p.end;
  8. }
  9. private static final long serialVersionUID =234098243823485285L; // Any number will do (Item 87)
  10. }

Next, add the following writeReplace method to the enclosing class. This method can be copied verbatim into any class with a serialization proxy:

接下来,将以下 writeReplace 方法添加到外围类中。通过序列化代理,这个方法可以被逐字地复制到任何类中:

  1. // writeReplace method for the serialization proxy pattern
  2. private Object writeReplace() {
  3. return new SerializationProxy(this);
  4. }

The presence of this method on the enclosing class causes the serialization system to emit a SerializationProxy instance instead of an instance of the enclosing class. In other words, the writeReplace method translates an instance of the enclosing class to its serialization proxy prior to serialization.

该方法存在于外围类,导致序列化系统产生 SerializationProxy 实例,而不是外围类的实例。换句话说,writeReplace 方法在序列化之前将外围类的实例转换为其序列化代理。

With this writeReplace method in place, the serialization system will never generate a serialized instance of the enclosing class, but an attacker might fabricate one in an attempt to violate the class’s invariants. To guarantee that such an attack would fail, merely add this readObject method to the enclosing class:

有了这个 writeReplace 方法,序列化系统将永远不会生成外围类的序列化实例,但是攻击者可能会创建一个实例,试图违反类的不变性。为了保证这样的攻击会失败,只需将这个 readObject 方法添加到外围类中:

  1. // readObject method for the serialization proxy pattern
  2. private void readObject(ObjectInputStream stream) throws InvalidObjectException {
  3. throw new InvalidObjectException("Proxy required");
  4. }

Finally, provide a readResolve method on the SerializationProxy class that returns a logically equivalent instance of the enclosing class. The presence of this method causes the serialization system to translate the serialization proxy back into an instance of the enclosing class upon deserialization.

最后,在 SerializationProxy 类上提供一个 readResolve 方法,该方法返回外围类的逻辑等效实例。此方法的存在导致序列化系统在反序列化时将序列化代理转换回外围类的实例。

This readResolve method creates an instance of the enclosing class using only its public API and therein lies the beauty of the pattern. It largely eliminates the extralinguistic character of serialization, because the deserialized instance is created using the same constructors, static factories, and methods as any other instance. This frees you from having to separately ensure that deserialized instances obey the class’s invariants. If the class’s static factories or constructors establish these invariants and its instance methods maintain them, you’ve ensured that the invariants will be maintained by serialization as well.

这个 readResolve 方法仅使用其公共 API 创建了一个外围类的实例,这就是该模式的美妙之处。它在很大程度上消除了序列化的语言外特性,因为反序列化实例是使用与任何其他实例相同的构造函数、静态工厂和方法创建的。这使你不必单独确保反序列化的实例遵从类的不变性。如果类的静态工厂或构造函数建立了这些不变性,而它的实例方法维护它们,那么你就确保了这些不变性也将通过序列化来维护。

Here is the readResolve method for Period.SerializationProxy above:

以下是上述 Period.SerializationProxy 的 readResolve 方法:

  1. // readResolve method for Period.SerializationProxy
  2. private Object readResolve() {
  3. return new Period(start, end); // Uses public constructor
  4. }

Like the defensive copying approach (page 357), the serialization proxy approach stops the bogus byte-stream attack (page 354) and the internal field theft attack (page 356) dead in their tracks. Unlike the two previous approaches, this one allows the fields of Period to be final, which is required in order for the Period class to be truly immutable (Item 17). And unlike the two previous approaches, this one doesn’t involve a great deal of thought. You don’t have to figure out which fields might be compromised by devious serialization attacks, nor do you have to explicitly perform validity checking as part of deserialization.

与防御性复制方法(第 357 页)类似,序列化代理方法阻止伪字节流攻击(第 354 页)和内部字段盗窃攻击(第 356 页)。与前两种方法不同,这种方法允许 Period 的字段为 final,这是 Period 类真正不可变所必需的(Item-17)。与前两种方法不同,这一种方法不需要太多的思考。你不必指出哪些字段可能会受到狡猾的序列化攻击的危害,也不必显式地执行有效性检查作为反序列化的一部分。

There is another way in which the serialization proxy pattern is more powerful than defensive copying in readObject. The serialization proxy pattern allows the deserialized instance to have a different class from the originally serialized instance. You might not think that this would be useful in practice, but it is.

序列化代理模式还有另一种方式比 readObject 中的防御性复制更强大。序列化代理模式允许反序列化实例具有与初始序列化实例不同的类。你可能不认为这在实践中有用,但它确实有用。

Consider the case of EnumSet (Item 36). This class has no public constructors, only static factories. From the client’s perspective, they return EnumSet instances, but in the current OpenJDK implementation, they return one of two subclasses, depending on the size of the underlying enum type. If the underlying enum type has sixty-four or fewer elements, the static factories return a RegularEnumSet; otherwise, they return a JumboEnumSet.

考虑 EnumSet 的情况(Item-36)。该类没有公共构造函数,只有静态工厂。从客户端的角度来看,它们返回 EnumSet 实例,但是在当前 OpenJDK 实现中,它们返回两个子类中的一个,具体取决于底层枚举类型的大小。如果底层枚举类型有 64 个或更少的元素,则静态工厂返回一个 RegularEnumSet;否则,它们返回一个 JumboEnumSet。

Now consider what happens if you serialize an enum set whose enum type has sixty elements, then add five more elements to the enum type, and then deserialize the enum set. It was a RegularEnumSet instance when it was serialized, but it had better be a JumboEnumSet instance once it is deserialized. In fact that’s exactly what happens, because EnumSet uses the serialization proxy pattern. In case you’re curious, here is EnumSet’s serialization proxy. It really is this simple:

现在考虑,如果序列化一个枚举集合,它的枚举类型有 60 个元素,然后给这个枚举类型再增加 5 个元素,之后反序列化这个枚举集合。当它被序列化的时候,返回 RegularEnumSet 实例,但最好是 JumboEnumSet 实例。事实上正是这样,因为 EnumSet 使用序列化代理模式。如果你好奇,这里是 EnumSet 的序列化代理。其实很简单:

  1. // EnumSet's serialization proxy
  2. private static class SerializationProxy <E extends Enum<E>> implements Serializable {
  3. // The element type of this enum set.
  4. private final Class<E> elementType;
  5. // The elements contained in this enum set.
  6. private final Enum<?>[] elements;
  7. SerializationProxy(EnumSet<E> set) {
  8. elementType = set.elementType;
  9. elements = set.toArray(new Enum<?>[0]);
  10. }
  11. private Object readResolve() {
  12. EnumSet<E> result = EnumSet.noneOf(elementType);
  13. for (Enum<?> e : elements)
  14. result.add((E)e);
  15. return result;
  16. }
  17. private static final long serialVersionUID =362491234563181265L;
  18. }

The serialization proxy pattern has two limitations. It is not compatible with classes that are extendable by their users (Item 19). Also, it is not compatible with some classes whose object graphs contain circularities: if you attempt to invoke a method on such an object from within its serialization proxy’s readResolve method, you’ll get a ClassCastException because you don’t have the object yet, only its serialization proxy.

序列化代理模式有两个限制。它与客户端可扩展的类不兼容(Item-19)。而且,它也不能与对象图中包含循环的某些类兼容:如果你试图从对象的序列化代理的 readResolve 方法中调用对象上的方法,你将得到一个 ClassCastException,因为你还没有对象,只有对象的序列化代理。

Finally, the added power and safety of the serialization proxy pattern are not free. On my machine, it is 14 percent more expensive to serialize and deserialize Period instances with serialization proxies than it is with defensive copying.

最后,序列化代理模式所增强的功能和安全性并不是没有代价的。在我的机器上,通过序列化代理来序列化和反序列化 Period 实例的开销,比用保护性拷贝进行的开销增加了 14%。

In summary, consider the serialization proxy pattern whenever you find yourself having to write a readObject or writeObject method on a class that is not extendable by its clients. This pattern is perhaps the easiest way to robustly serialize objects with nontrivial invariants.

总之,当你发现必须在客户端不可扩展的类上编写 readObject 或 writeObject 方法时,请考虑序列化代理模式。要想稳健地将带有重要约束条件的对象序列化时,这种模式可能是最容易的方法。