A summary of OASIS standard advantages and weaknesses is presented, based on six years of experience with customer databases.
By Dr PHILIPPE MOREY-CHAISEMARTIN, FREDERIC BRAULT, XYALIS, Grenoble, France
Six years ago, as OASIS was introduced, we published an article highlighting why it was a positive replacement for GDSII . Since then, users have started adopting OASIS in their flows, with benefits and disadvantages. One of OASIS strengths is its flexibility (unlimited coordinate precision, unlimited number of layers, etc.). But this flexibility puts unnecessary stress on layout database processing tools, in terms of memory consumption and computing times.
A new standard, OASIS.MASK, is being introduced to address the requirements specific to photomask layout representation. This new standard, a subset of OASIS, and as such fully compliant with it, introduces constraints that reflect the real-world limitations of mask manufacturing (such as a limited number of masks per process) that make the interpretation and exploitation of the OASIS.MASK databases more efficient and reliable.
Based on six years of experience with customer databases, this paper presents a summary of OASIS standard advantages and weaknesses. It reviews the constraints introduced in OASIS.MASK and highlights the benefits of using the new format. Some critical points related to too restrictive limitations are pointed out for consideration in future versions of the standard, and suggestions to improve its usability are detailed. Finally the white paper explains how the global flow from design to mask can be improved by the introduction of this format.
OASIS: a review
GDSII was introduced by Calma in 1978 as a successor of GDS format created in 1971. For 30 years, no major change has been made to this de-facto standard while chips complexity has been multiplied by 105 to 106. As a result, numerical values required to describe geometries of nanoscale structures on 300mm wafers have reached the 32 bits limits of GDSII format. The size of GDSII files is also becoming a problem, and compression is not always the best solution.
To address such issues, the OASIS format (officially SEMI P39) was developed and its first official specification was released in 2004 .
GDSII was originally created for sequential file access. It is still called a “stream format” and has been designed for magnetic tapes, which were the only media at the time able to store big amount of data. Designers still call the release milestone a “Tape Out,” even though most people have forgotten (or never known!) what an actual tape was.
Nowadays, sequential access is not an interesting property anymore and one of the new features offered by OASIS was to give direct access to parts of the file. To allow this, OASIS offers the possibility to store indexes in reference tables either at the beginning or the end of file.
In addition to faster random access, OASIS goals were to reduce file size and remove some critical limitations such as numerical precision. To meet this second requirement, the original restriction of 32 bits for all integer values has been removed. In the same way, the original limitation of GDSII to 256 layers and 256 datatypes has been cancelled.
To reduce file size, many different optimization methods have been implemented. The main idea is to avoid information repetition. For example, when reading a record in an OASIS file, a simple flag can tell the parser that the required information (cell name, layer, coordinate, etc.) is the same as for the previous record.
Another optimization is to provide a large number of different record types and then to reduce the amount of information contained in the record itself. For example, in GDSII all shapes are identified by the same record type: BOUNDARY. It is then mandatory to list the coordinates of all vertices to fully describe the shape. In OASIS, a square is not defined as a generic BOUNDARY record, but as a special type. The only required information are the coordinate of the lower left corner and the size of the square. This is true for most common configurations, which have their own record type: there are 26 different types of trapezoids (including the square) and 11 types of repetitions (including standard arrays). Defining a displacement between 2 coordinates is also declined in different types: pure horizontal displacement, pure vertical displacement, diagonal, etc. So, in most cases, only 1 value is required instead of a delta-X and delta-Y.
This makes OASIS files really compact. Considering that compression algorithms are based on the factorization of repetitions in a file, it is easy to understand that the compression ratio of an OASIS file with standard tools like gzip or bzip is not very high. OASIS files are compressed by construction and it is usually a bad idea to zip them again as you would then lose the first benefit of the format: random access.
How has it been used?
In OASIS, reference to an object can most of the time be made either by its name or by a reference number. For example, a PLACEMENT record can call a cell by its cell name, or directly by a reference number. When thousands of cells are declared and instantiated, using reference numbers can really reduce the size of the resulting OASIS file. When reference numbers are used (for cell names, strings, properties, etc.), the file must contain declarations that associate a reference number with a character string. According to the OASIS specification, these declarations can appear anywhere in the file. This implies that, in general, OASIS files cannot be properly handled until they have been fully parsed by the reader.
To improve access times, OASIS also allows these declarations to be written contiguously in “tables”. The START record of the file then contains the offset of these tables, which can be directly accessed by the reader. As a convenience for OASIS writers, the offsets can also be put in the END record: this way, the file can be written sequentially, without having to “patch” the START record afterwards (which is hard to do when writing a compressed stream).
In the latter case, the reader now has to seek to the end of the file to know the table offsets. This becomes a problem when using an external tool to compress the file (such as zip), as most compression schemes do not allow random access.
OASIS strict mode is a way to take advantage of reference tables by enforcing their use. For each reference table (cell name, layer name, property names, etc.), when strict mode is defined, then all references made to an object must be made by reference number. Also, the specification guarantees that all declarations that associate a reference number with a name must be made in a contiguous table, which offset is provided in either the START or END record.
The goal of strict mode is to allow fast random access in an OASIS file. A reader that takes advantage of it can start working on a file before having read it all, and can access some portions of the file directly, without having to read it sequentially. However, strict mode is optional, and, according to the specification, readers can choose not to take advantage of it, in which case they are not required to raise errors related to strict mode violation. This leads to situations where a file can appear valid to one reader, and invalid to another. More problematic are some databases we have seen, where the same reference number was associated to different names, at different places in the file. A reader taking advantage of strict mode would only see one declaration and continue working, although the file was ambiguous and invalid.
OASIS CBLOCK records provide a way to embed compressed data within an OASIS file. Contrary to using an external tool on the whole file, they allow both compression and efficient access. As of today, the only compression method supported is the lossless DEFLATE format (also found in zip). Since design files are getting larger and now routinely weight tens of Gigabytes, OASIS should probably consider including newer and more efficient compression algorithms.
OASIS allows three different validation schemes: no validation, a CRC checksum or a byte sum. If selected, the validation writes 32 bits in END record of the file. As its name implies, the goal of the validation scheme is to detect errors or corruptions (due to file system conversion, transfers, etc.). The byte sum is a simple sum of all the bytes contained in the file, truncated to 32 bits. It is easy to implement but, as the OASIS specification itself admits, it is not very efficient for detecting errors. The CRC checksum is a common 32 bits polynomial Cyclic Redundancy Check. It is the only validation scheme that really detects error, but it comes at the cost of being byte-order dependent: the file must be written and read sequentially.
Unlike MD5 or the now recommended SHA cryptographic functions, it is worth noting that none of the OASIS validation schemes offer real “tamper-proof” checksums. It is really easy to create two OASIS files with the same checksums, but different content.
Sadly, it seems that many tools turn off validation by default. Users then have to resort to external tools for error detection (such as Unix cksum).
Advantages and weakness
The OASIS specification defines a format for integers that support unlimited precision: numbers can be arbitrarily long.
To fully implement unlimited precision, OASIS compliant software should use specialized math libraries that are able to perform operations of arbitrary precision. However, the cost of these algorithms becomes an issue for compute intensive code running on big databases.
In practice, many implementations choose to use the computer’s native precision (usually 64 bits nowadays). To prevent divergence in interpretation, the OASIS format should probably evolve to restrain the precision of integers to 64 bits. A precision of 32 bits was definitively not enough, but overpassing 64 bits may not be useful. As an example, the cube size of an elementary silicon crystal cell is 0.543nm. A value of 64 bits allows to give the exact coordinate with this precision, on a wafer of more than 10000km, almost earth diameter!
For floating point numbers, OASIS specifies 8 different formats. Some can express exact mathematical values. It is thus possible to represent the exact value of the fraction 1/3 in OASIS, even though a computer cannot use it natively. For practical purposes, it seems that OASIS could use only two of these formats: the ones based on IEEE 754 floating point representation, already used by all software and implemented natively by most processors.
Choosing a description methodology
A big advantage of OASIS is that there is a lot of choice. 8 formats to choose from when expressing a floating point value, 11 repetition types for PLACEMENT records, reference by name or by number for cells, text, properties, etc.
Most of the time, the OASIS specification does not impose or even recommend any of the different options. In the end, all this freedom becomes a problem, and most tools stick to a small subset that they use all the time.
Our recommendation would be to use the following subset of OASIS, which has proven to be the most robust in our experience:
- Use strict mode whenever possible, and make all references by reference number (and not by name).
- Use CBlocks rather than compressing the whole file.
- Use CRC checksum to detect errors.
- Never write numbers that cannot be represented as 64 bits integers.
- Use type 6 or 7 for floating point numbers (IEEE 754 single or double precision).
- When in doubt, use type 8 repetitions (for arrays), or type 10 (for random repetitions) : these are the most generic.
- Try to use rotations that are multiples of 90 degrees. A rotation of 12.34 degree does not make much sense in a real mask!
To further optimize the OASIS file, one can take advantage of modal variables. According to the specification, these modal variables implicitly store the state of the preceding element. For example, they are used to instantiate a polygon in the same layer as the polygon, which was previously instantiated. When instantiating one thousand polygons in the same layer, we omit repeating one thousand times the layer number. At the scale of a complete database, this can lead to real gains in file size.
In order to fully take advantage of modal variables, the OASIS writer should sort the elements as efficiently as possible to form groups that share common parameters.
OASIS.MASK: the missing link
On one hand, OASIS has been a big step forward for the physical design description compared to GDSII, but on the other hand, photomasks description is still using an old format called MEBES. This format was created in the early seventies by ETEC, and has been updated a few times since the beginning, starting from Mode-1 to current Mode-5 version. The original Mode-1 format was definitively not suitable for describing larges chips with a precision of 1nm or even less.
Most of the foundries and mask-shops worldwide are using MEBES files, but some other formats like JEOL, VSB, MIC, HL, DXF or Gerber are also available. These formats are proprietary and are not “official” maintained standards. So, as for GDSII replacement, the basic idea was to set up a new standard for mask layout description. Mask layout and design layout are luckily not so far from each others, and the other great idea was to use the same format: OASIS. But, even if the final physical representation should be the same, photomasks description have some particularities and a straight forward usage of OASIS for this type of application was not really possible. So, SEMI P44, also known as OASIS.MASK was born .
It is important to notice that OASIS.MASK is not really a new format. It is a formal subset of OASIS, and as such, an OASIS.MASK file can be read by any standard OASIS parser. It mostly adds some constraints compared to the initial format, but also provides several extensions to simplify its manipulation by mask data preparation tools.
While a design database is a hierarchical description based on the functions used in a chip, an OASIS.MASK file is a description of the layout based on the topology. It still keeps a hierarchy, but limits it to 3 levels. The first level is the top cell, the second level is made of so called localization areas and the third and last level is made of small cells named common cells. Localization areas do not have any relationship with the original functional hierarchy. They are just rectangular areas at a given place of the full layout. OASIS.MASK may then be seen as a flattened view of the chip just divided in windows.
This is more or less what is done in a MEBES file in which the original hierarchy has completely vanished. In such formats as MEBES, the hierarchy is purely topological: the first level is made of columns (the segments) and the second level is made of rows (the stripes). But as all the stripes and the segments are different, it is not possible to share any cell declaration, so the file is a fully flat description and is thus very big, nowadays sometimes exceeding a terabyte.
In all large chips, there is an intensive usage of library cells. All basic gates or memory points are called millions of times but only need to be described once. This has been taken into account in OASIS.MASK in which only the upper level cells of the functional hierarchy need to be flattened. All “small” cells may remain unchanged and are called from the intermediate level of the hierarchy. These small cells are called common cells. Their usage drastically reduces the final file size.
FIGURE 1 represents the functional hierarchy of a chip as described in a standard OASIS file. FIGURE 2 represents the same chip with the topological hierarchy in OASIS.MASK format. We can see that all small (common) repeated cells are described the same number of times in both representations, while large cells have been flattened. Flattening big functional blocks has no major impact on the final size as these cells usually appear at most a few times in the chip.
Being a subset of OASIS, OASIS.MASK introduces some restrictions on the original format. Most of them are to be expected, but some are more surprising. Although they might change in the future, here is a list of the current restrictions imposed by the specification :
- Cell names and properties can only include ASCII letters, numbers and underscore (63 characters to choose from)
- File names cannot be more that 64 bytes long (256 including directory), and must be composed of only these 63 characters (plus the period)
- Hierarchy is limited to 3 levels
- Strict mode is mandatory (all references are made by numbers, encoded as 32 bits integers)
- Validation checksum is mandatory (but sadly, byte sum is allowed, although ineffective)
- Names cannot be longer than 256 bytes (cell name, layer name, property name, …)
- Coordinate values for PLACEMENT records are limited to 32 bits signed integers
- Only repetitions 0-3 (regular 2D arrays) are allowed (and sadly not the more generic type 8 and 10)
- Magnification, mirroring or rotation are disabled
- There can only be one layer per file
- Layer numbers are limited to 256, and datatypes are ignored.
- Only rectangles and trapezoids are allowed (no paths or generic polygons)
- The bounding box for all cells must be described (using OASIS S_BOUNDING_BOX property)
- A cell cannot be larger that 1 square millimeter (except Top Cell)
- The file offset of all cells must be given (using the S_CELL_OFSET property, encoded as a 64 bits integer)
As stated in the OASIS.MASK specification, these constraints may be relaxed in the future. For example, given the pace at which mask complexity evolves, we wouldn’t be surprised to see 32 bits integers get promoted to 64 bits in the years to come.
OASIS.MASK extensions are implemented as plain OASIS properties, or remain compatible with the regular OASIS format. To make them easily identifiable, all OASIS.MASK related properties are prefixed with the SEMI standard’s number: P44.
As already discussed, the major improvement of OASIS.MASK over OASIS is the introduction of localization areas. They are implemented with the P44_LOCALIZATION and P44_LOCALIZATION_AREA properties.
As its name implies, the P44_COMMON_CELL property is used to declare common cells.
The P44_GEOMETRY_OFFSET_AVAILABLE and P44_GEOMETRY_OFFSET properties allow fast access to the geometry records of every cells (one can directly jump to the declaration of rectangles and trapezoids, skipping all PLACEMENT records).
The P44_TOP_CELL_NUMBER property declares the reference number of the top cell. Since the P44_LOCALIZATION property also contains the top cell record offset in the file, finding the top cell in an OASIS.MASK files becomes very easy (and very fast), contrary to a generic OASIS file where in the worst case the entire file must be read and analyzed before being able to compute the top cell.
Other properties are also defined in the OASIS.MASK specification (such as P44_FILE_SIZE, P44_CHIP_WINDOW, P44_BOUNDING_BOX_MAX, etc.).
Most of these properties are mandatory and, when correctly used, they drastically speed up file access and topological extraction of mask areas.
In almost all design flows, the designer runs a final Design Rule Check and a final LVS (Layout vs. Schematic) on his database before sending it for mask making. This mandatory step is considered as fully reliable. But, unfortunately, it appears that some types of description are subject to interpretation. We have identified a couple of configurations for which the result may vary depending on the tool. We can’t say that one tool is better than another, we can just observe that their interpretations are different for things which are not fully specified in the standard. A typical example is a path with a segment smaller than the width. See FIGURE 3.
The good news is that such issues are much less frequent in OASIS, when compared to the former GDSII standard. But some ambiguities still remain, and a database certified as correct by a DRC, may lead to an error on the silicon because the mask data preparation tool and the layout tool have different interpretations for such corner cases.
It is thus of utmost importance to perform the verification at the last possible step in the flow and on a format without any ambiguity. OASIS.MASK is an OASIS subset, so any tool able to read OASIS files is able to read an OASIS.MASK database. This means that a design rule check can be executed on an OASIS.MASK file.
The OASIS.MASK file will be used for post layout data processing and it will be the result of a conversion from the designer database. It is important to be able to run a verification on the converted file. Additionally, there is no risk of misinterpretation since all the geometries contained in an OASIS.MASK file are simple trapezoids.
But at the current state of OASIS.MASK specification, we are facing different problems. The first one is related to layers: an OASIS.MASK file should not contain more than one layer. This makes the full verification much more complex. It is possible to verify the design rules for each layer, one after the other, but to check the inter layer rules, different layers must be seen at the same time. To perform the DRC, we must load multiple OASIS.MASK files at the same time, each of them containing a single layer. Additionally, data types usage is not clear: they are widely used in OASIS files, but are notified as ignored in OASIS.MASK. This should only involves the mask writing equipment, and we may consider that, as present in the file, the data types may still be usable by other applications. These limitations and differences of layers and data types between both formats, imply that we can’t use the same DRC deck to verify the design database and its converted version in OASIS.MASK format.
In practice there are also different lists of layers/datatypes: CAD layers and datatypes used by designers, and mask layers used for mask making and wafer processing. There is of course a relationship between both and mask layers can always be deduced from CAD layers and datatypes, while the opposite is not necessarily true. Designers do not draw all the layers and datatypes. Some process layers are computed from others by boolean operation (AND, OR, XOR, NOT). Design engineers also use some recognition layers or datatypes which are not processed, but still defined to simplify device identification. The DRC or LVS decks may need to take these differences into account in order to work on the OASIS.MASK file.
Adding more complexity by restricting files to a single layer and no datatypes seems like an unnecessary burden, especially if we consider that allowing multiple layers and datatypes per file comes at almost no cost in terms of implementation.
The Second issue is related to the hierarchy. As mentioned before, a design database has most of the time a functional hierarchy. This hierarchy should follow the schematic hierarchy and the layout vs. schematic verification is performed hierarchically. Each cell in the schematic should match a cell in the layout, and the interconnections between them should match the connections drawn at a higher level of the hierarchy. Working on the hierarchy speeds up the process, and makes it easier to identify and localize errors.
In an OASIS.MASK file, the functional hierarchy is replaced by a topological hierarchy, which makes it completely different from the schematic. Only “small” cells may remain. This means that a hierarchical LVS will be able to identify all gates and library cells, and have them matching schematic, but all the interconnections at a higher level will be processed flat. This is, of course, possible, but it will lead to a dramatic increase in processing time.
With advanced process nodes, layout requires some post processing to meet manufacturing constraints. The two major operations to be performed are “dummification” and OPC (Optical Proximity Correction). Dummification consists in inserting small geometries in empty areas of some layers in order to guarantee an homogenous density across the whole chip. The goal is to obtain an optimal flatness during chemo-mechanical planarization steps (See ). Usually the inserted “dummies” are simple polygons that can be described by a single or at most a few trapezoids. The insertion is made flat at the top level of the design as it depends on the global topology of the layout. So dummification can easily be performed on an OASIS.MASK file.
Additionally, the insertion of dummy geometries is based on a density analysis of the layer. This analysis is made on a window basis and requires to have a flat view of the top level of the layout. When using a regular OASIS file, flattening a full chip requires a huge amount of memory, although only one small part of the drawing (the window) is processed at a time. The OASIS.MASK format, which is topological by design, is definitively more convenient for such an operation. The physical representation of the chip has already been flattened and split in small areas. Computing the density on pre-defined windows will require to load only a few localization areas. It will then be very easy to insert the expected trapezoids in these areas to reach the expected density. This will be performed locally without any impact neither in others areas nor in common cells. This also means that the process can easily be parallelized.
Unfortunately, one of the restriction of OASIS.MASK compared to OASIS leads to a big degradation in terms of file size. In order to have a better distribution of parasitics introduced by all the dummy geometries, they are often placed in a non-orthogonal way as explained in, see FIGURE 4. A full chip may contain millions of dummy geometries and it is important to have them instantiated in arrays instead of having millions of individual instantiations.
In GDSII, it was only possible to have orthogonal arrays, and the apparition of such dummies in the designs has lead to a dramatic increase of the file size. One of the interesting features of OASIS was the possibility to describe non-orthogonal arrays of cells or geometries. This is known as type 8 repetition. Thanks to this feature, it has been possible to reduce file size. But OASIS.MASK does not allow repetition of other types than 0-3, i.e. a regular orthogonal arrays. This has a very negative impact on the output file size.
The second critical operation to be done on final design layout is OPC. This consists in slightly modifying all existing geometries to compensate light diffraction and interference phenomena through the photo-mask. This is also a time consuming operation that requires a flat view of the layout. As their name implies, these optical corrections are only related to proximity effects. So a window based analysis is well adapted to this processing and OASIS.MASK format meets all requirements.
All geometries in an OASIS.MASK are split in basic trapezoids. This means that the fracturing step is mostly done, even if some work might still remain, like boolean operations, clipping or sizing (bias adjustment). It is also important to point out that OASIS.MASK doesn’t allow any rotation. Any cell with a rotation should be flattened and all its geometries converted to basic trapezoids which type will not change. Performing computation on trapezoids is quite easy and very fast and always results in simple shapes easily fractured in trapezoids. For example, clipping a trapezoid by a rectangle will never result in more than one convex polygon. See FIGURE 5. In the same way, applying a sizing on a trapezoid will always result in a trapezoid of the same type.
As stated in , OASIS.MASK is well suited for efficient reading, especially for a viewer that usually displays a small area of the whole mask. To allow for fast random access based on the topology (e.g., display the top left corner of the mask), the OASIS.MASK file must use localization areas efficiently. For example, in FIGURE 6, only the localization area matrix will permit fast access.
When building the matrix, there is a tradeoff between access time and file size: small localization areas will allow for faster access, but will also increase file size. On the contrary, larger localization areas slow down read access but keep the file smaller. The extreme case is to use only one localization area: we are back to a regular OASIS file, very compact but without topological information.
Another thing to take into account is the amount of cells that will be put in common between all the localization areas. Putting many cells in common will reduce the file size, but will also increase access time. Note that if file size becomes an issue, one can also use CBLOCKs (one per localization area) to compress data.
Also note that localization areas are not declared in a regular array, but just as a list of statements in the top cell record. Hence they do not have to share common dimensions. For some files, it can thus be more efficient to build an “adaptive” matrix. For example, the localization area width and height can vary based on the cell density of the mask.
All these parameters need to be taken into account in the OASIS.MASK writer to strike a good balance between file size and random-access speed.
During the last few years, we have seen all the benefits of the OASIS format, as well as some of its weaknesses. OASIS is a complex format that provides many options, and it seems like everybody uses it in a different, sometimes suboptimal, way.
OASIS.MASK fixes some of these issues while at the same time offering a topological view of the database, which is required by some tools in the layout flow. And its compatibility with OASIS allows for a smooth transition between the design world and the process world. As such, OASIS.MASK is an important step towards standardizing the various file formats used in microelectronics.
OASIS.MASK still has some shortcomings, and we have pointed out some of the limitations which today prevent integrating this format in some steps of the flow. But OASIS.MASK is still young, and, as its specification mentions, “part of restrictions on OASIS.MASK may be relaxed, and OASIS.MASK specification may be extended”. We are confident that this format will be able to evolve and become a standard used across the whole design and layout flow. Our industry has much to gain from it in terms of robustness and usability.
1. Karmarkar, Aditya P. and Xu, Xiaopeng and Moroz, Victor and Rollins, Greg and Lin, Xiao, “Analysis of performance and reliability trade-off in dummy pattern design for 32-nm technology”, IEEE Computer Society (2009), 185–189.
2. Morales Domingo , Pablo Canepa Juan , Cohen Daniel, “Efficient OASIS.MASK reader”, SPIE (2010).
3. P. Morey, “Going from GDSII to Oasis”, EEtimes (2008).
4. Philippe Morey-Chaisemartin, “Layout finishing of a 28nm, 3 billions transistors, multi-core processor” (2013).
5. SEMI International Standards, “SEMI P45-0211 SPECIFICATION FOR JOB DECK DATA FORMAT FOR MASK TOOLS”.
6. SEMI International Standards, “SEMI P39-0308 – OASIS (TM) – Open Artwork System Interchange Standard” (2008).
7. SEMI International Standards, “SEMI P44-0211 – Specification for Open Artwork System Interchange Standard (OASIS ®) Specific to Mask Tools” (2010).
Dr PHILIPPE MOREY-CHAISEMARTIN, FREDERIC BRAULT, XYALIS, Grenoble, France. Email: firstname.lastname@example.org, email@example.com