Prettier - Scalability

Introduction to Prettier

Prettier is an opinionated code formatter with support for a wide range of programming languages and frameworks1. Prettier can be run either locally, via a CLI2, a pre-commit hook3, or in a supported IDE. Some companies also run Prettier in their CI/CD-pipeline. Prettier should cover a lot of different use cases, from the programmer wanting to format the single file he is currently working on through his IDE, to huge companies wanting to format their whole codebase in a CI/CD-pipeline. Prettier lists 81 large companies such as Facebook and Spotify that use Prettier4. Prettier must therefore find a solution that makes the formatting of files feel performant on a single file and on a huge codebase. Since Prettier is executed locally, scaling Prettier is not as much about the standard scaling for multiple concurrent users as it is about scaling to support huge data throughput on a single machine, i.e., formatting large numbers of files. We will therefore mostly discuss parallelization for performance as this is more relevant for Prettier and parallelization has many similar issues and pitfalls as scaling distributed systems.

We will begin this text with a description of the scalability issues that Prettier faces in section Prettier and Scalability. This will elaborate on the brief discussion above and present the results of tests for the identified issues. In section Architecting for Scalability we will describe the current architecture as it relates to the scalability issues and follow up with possible and planned architectural changes to solve these issues.

Prettier and Scalability

“Scalability is a function of how performance is affected by changes in workload or in the resources allocated to run your architecture.”5. Scalability falls into two broad categories: horizontal and vertical scaling. Vertical scaling is about upgrading the resources on a single node. An example of vertical scaling would be to increase the number of CPUs on a single computer. Horizontal scaling is about adding more nodes to solve a problem. An example of horizontal scaling is adding more computers to solve a task.

As mentioned in the introduction, Prettier runs on a single machine, scalability is about making Prettier performant with different sizes and types of data input. Horizontal scaling does not fit Prettier as it is outside Prettier’s control to decide what kind of resources the user’s computer has. However, what Prettier can do is to support optimal usage of stronger machines with multiple CPUs by allowing the code to be run in parallel or increase the performance of parts of the code by improving the algorithms. Prettier is already quite performant and already in 2017 one of the maintainers of Prettier, Christopher Chedeau, claimed that it would take only three minutes to run Prettier on the entire Facebook codebase6.

There are four key scalability challenges we have identified that are relevant for Prettier:

  1. The size of the files that Prettier should format. There was an issue, which is now closed, where running the Ruby plugin for Prettier made it crash if the files were larger than 30000 lines of code7.
  2. The number of files that the user asks Prettier to format. There was an issue, which is now closed, where a user asked Prettier to format 10000 files and it crashed his Mac8.
  3. Prettier must support formatting different types of files in large codebases and asking Prettier to format different file types might affect its performance.
  4. Prettier has a large and diverse userbase and should support all their relevant use cases. Prettier should be performant for programmers formatting single files and huge companies wanting to format their whole codebases.

To check the current state of Prettier’s performance under different varying workloads we created a test repository to test the key scalability challenges that Prettier faces at the moment.

We ran the tests on a laptop with these specs. We also created some scripts and the test process is discussed in the README to improve the reproducibility.

The tests were split into three: testing to format a single large JavaScript file, formatting many JavaScript files with one line of code, and formatting different files with one line of code. For formatting different files, we used equal amounts of JavaScript, CSS, JSON, Markdown, and Typescript code.

As discussed in an issue on Prettier9. The time Prettier reports is not the actual time taken including the overhead of loading node and Prettier, but it is the time taken to format the file. We have therefore included Prettier’s reported time in ms and the real-time reported by the bash command time.

Test results:

Number of lines Single large file (Prettier ms/ real-time) Many files (Prettier ms/real-time) Many different files (Prettier ms/real-time)
100 183ms/1.925s 823ms/12.816s 2153ms/19.48s
1000 591ms/2.268s 5116ms/1m37.498s 5973ms/2m44.658s
2000 843ms/2.584s 7745ms/5m16.821s 10228ms/5m20.225s
10000 5989ms/11.836s 44522ms/23m56.831s 52160ms/23m24.805s
100000 49165ms/54.126s NOT DONE NOT DONE
200000 199496ms/3m25.324s NOT DONE NOT DONE
300000 ERROR NOT DONE NOT DONE

As we can see from the tests Prettier performs much better on a single file. This makes sense as Prettier does not have to do all the overhead of opening, closing, and writing files. We also see that the time Prettier reports is higher when formatting different files even though real-time is almost the same. Prettier reports an error when the JavaScript file is 300000 lines of code, which is unrealistically large. We did not perform tests with more than 10000 different files because the previous tests were too time-consuming.

To improve the time of Prettier parallelization can be used to format files in parallel. Amazon Web Services did some tests and was able to cut the running time by approximately 50%10.

Architecting for Scalability

Current architecture

As seen, Prettier is pretty fast. Formatting Facebook’s entire codebase in around three minutes is certainly a great feat. However, to achieve this performance Christopher Chedeau used Unix tools to run to start multiple parallel Prettier processes6. Since Prettier theoretically can be run in parallel, i.e., formatting different files simultaneously, it has a potential for a significant speed-up according to Amdahl’s law about parallelization11. Parallel execution is not supported natively by Prettier, instead when executing prettier --write path/to/dir, every file in the directory is formatted sequentially in one process12, as demonstrated in the figure below.

Figure: Prettier as-is: runs formatting sequentially

Another identified performance issue concerns Markdown-files with multiple embedded code blocks. To format these blocks, Prettier starts new third-party processes when they are encountered, which is shown in the figure below. This is done in a sequential fashion13. For files with multiple issues, this could be a problem since it has taken as much as 2-3 seconds to format them, which could make the editor feel slow and glitchy when using Format on Save13.

Figure: Sequential formatting of a Markdown-file with two code blocks.

To improve the performance of computing one could try to improve the algorithms used, but you could also try to identify independent structures of the problems and process them in parallel. Photoshop was an early adopter of this method and has used it successfully14. Parallelism could solve this second issue, but a problem with Prettier’s current architecture is that it is mostly synchronous, which does not allow for parallel processing13 12. Recently, Prettier CLI was changed to be asynchronous as a step to support the effort to make Prettier parallel15, but the test infrastructure, parsers, and printers are still synchronous.

Future architecture

There is an active issue discussion on how to make the parsers and printers asynchronous and one of the most active maintainers have been working on this13. These changes are comprehensive and will lead to compatibility issues with some plugins. There have been suggestions to support both synchronous and asynchronous versions of certain functions, while some claim only supporting asynchronous is better to make the codebase easier to maintain. As of now, Prettier will add asynchronous versions of parsers and printers, but how it is to be implemented is not decided13. With this implemented, solving the issue with Markdown, mentioned in the previous subsection, will be simple using a Scatter/Gather-approach. A worker could be spawned for each of the third-party processes, executed in parallel, and the results combined5 which is illustrated below. There are reasons for caution when making the code parallel since it could be a cause of bugs. These bugs could be difficult to detect and solve, which happened at Photoshop where a bug in parallel code lived for ten years14. Parallel code is notoriously hard to debug14 and its added complexity might be a reason why Prettier’s maintainers have been hesitant to make a Prettier run in parallel.

Figure: Parallel formatting of a Markdown-file with two code blocks.

Figure: Prettier to-be: Prettier spawns workers and runs in parallel if ‘–parallel’-flag is set

The effort to make Prettier run on multiple files at the same time has already progressed far. In 2019 a separate tool called parallel-prettier was available for download. On a project containing 1200 files, it was able to reduce execution time from 27 seconds to 1 second. Preparations have been made to integrate parallelization in Prettier, but as of 2021, there were still issues adapting the test infrastructure to this change12. The image above depicts how parallelization could be implemented natively into Prettier using the flag --parallel.

The two improvements described above, are performance improvements. Testing these changes would therefore be based on comparing execution or CPU time. The test cases should vary from large codebases to smaller projects and single files. In addition to CPU time, it is important that the parallelized version output correctly formatted code. This could be done by simply validating that its output matches the sequential program’s output.

To test the improved performance of running Prettier in parallel, we ran the same tests as previously discussed, but using the Parallel-Prettier tool. Parallel-Prettier does not provide the time it takes to format each file, but we can still check the real-time using the time command. The results of these tests can be found in the test repository.

Test results:

Number of lines Single large file (real-time) Many files (real-time) Many different files (real-time)
100 1.168s 2.356s 2.606s
1000 1.713s 7.916s 7.854s
2000 1.814s 10.570s 11.357s
10000 3.441s 27.429s 27.310s
100000 53.570s NOT DONE NOT DONE
200000 2m52.615s NOT DONE NOT DONE
300000 ERROR NOT DONE NOT DONE

As we can see from comparing the test results from running Prettier and Parallel-Prettier there is a huge performance boost and the time it takes to format 10000 files changes from 23 minutes to 27 seconds. We do not see a significant improvement when running on a single large file.

Changing Prettier to be asynchronous is also beneficial for other reasons. Some parsers, such as the one for less.js are only available as asynchronous and are therefore incompatible with Prettier16. It is also crucial for facilitating the planned change to ES modules in Prettier v317 since Node.js cannot load them synchronously16.

Sources


  1. Prettier. (2022). Why Prettier?. Retrieved March 5, 2022, from https://prettier.io/docs/en/why-prettier.html ↩︎

  2. Prettier. CLI, Retrieved Match 1st, from https://prettier.io/docs/en/cli.html#list-different ↩︎

  3. Prettier. Pre-commit Hook, Retrieved Match 1st, from https://prettier.io/docs/en/precommit.html ↩︎

  4. Prettier. (2022). Who’s using Prettier?. Retrieved February 19, 2022, from https://prettier.io/en/users/ ↩︎

  5. Pautasso, C. (2021). Software Architecture: visual lecture notes. Retrieved from https://leanpub.com/software-architecture/ ↩︎

  6. Youtube. (2017, Apr 18), Javascript code formatting - Christopher Chedeau, React London 2017 - Youtube. Retrieved February 26, 2022, https://youtu.be/0Q4kUNx85_4?t=1480 ↩︎

  7. Prettier. Running prettier against large files crashes. Retrieved March 22, 2022, https://github.com/prettier/plugin-ruby/issues/757 ↩︎

  8. Prettier. Too many files open error with prettier cli. Retrieved March 22, 2022, https://github.com/prettier/prettier/issues/994 ↩︎

  9. Prettier. Slow start time. Retrieved March 24, 2022, https://github.com/prettier/prettier/issues/3386 ↩︎

  10. AWS. Running prettify on generate-clients takes 80 percent of the time. Retrieved March 24, 2022, https://github.com/aws/aws-sdk-js-v3/issues/1947 ↩︎

  11. GeeksforGeeks. (2019, April 1). Computer Organization | Amdahl’s law and its proof. Retrieved March 22, 2022, from https://www.geeksforgeeks.org/computer-organization-amdahls-law-and-its-proof/ ↩︎

  12. Feature Request: Parallel/Clustered Prettier. Retrieved March 22, 2022, https://github.com/prettier/prettier/issues/4980 ↩︎

  13. Prettier. Making parser or printer async. Retrieved March 22, 2022, https://github.com/prettier/prettier/issues/4459 ↩︎

  14. Cole, C., & Williams, R. (2010, September 9). Photoshop Scalability: Keeping It Simple - ACM Queue. Acm. Retrieved March 22, 2022, from https://queue.acm.org/detail.cfm?id=1858330 ↩︎

  15. Prettier. Use async APIs in CLI. Retrieved March 22, 2022, https://github.com/prettier/prettier/pull/10841 ↩︎

  16. Prettier. Proposal: Move to async api. Retrieved March 23, 2022, https://github.com/prettier/prettier/issues/7799 ↩︎

  17. Prettier. Drop support for Node.js 10, switch to ES Module. Retrieved March 13, 2022, https://github.com/prettier/prettier/issues/10157 ↩︎