Trying to figure out how to use Scrivener's "Binder" structure in a project

Cavalierex · September 11, 2024, 5:34pm

@ttscoff , I have a general nerdery question for you. As the esteemed developer of Marked, which amazingly can preview Scrivener documents so well, you might have some wonderful insight to share.

This goes back to my (never-ending) quest to create the best way to work on longform Quarto documents. A Quarto “Book” project has a hierarchy of folders and documents. The structure matters less than one might think, because the _quarto.yml configuration file will specify the organization of the Book (into “parts”, “chapters”, and “appendices”). Nevertheless, it is common when editing a Quarto Book to create a hierarchy of folders (so you keep your sanity managing dozens of text documents).

I am wondering whether I can write my longform Quarto projects in Scrivener (with all the power tools available to me there), and then compile the contents into essentially a hierarchy of folders and files that is set up as the perfect starting point for the quarto render --to=pdf command to assemble the Book.

The challenge before me is this: How can I use Scrivener’s existing Binder structure to dynamically

create the hierarchy of nested folders that organize the files,
deposit the compiled Markdown documents into the right place, and
populate the _quarto.yml configuration file with the relative file paths to these parts/chapters/appendices?

Here’s an example of a hypothetical Binder structure:

The folder tree that needs to be created would be something like this:

And a _quarto.yml file will need to be created to specify the hierarchical structure inputted there with file paths, e.g.,


# This file starts out with the options contained in the _quarto.yml document in Scrivener's Binder...

project:
  type: book

book:
  title: "My Quarto Book Project"
  subtitle: "Hypothetical Example of a Quarto Book from Scrivener"
  author: "cavalierex"

  date: last-modified
  
# The following chapter and appendix listing is what gets injected dynamically from the Binder structure...

  chapters: 
    - index.qmd
    - part: "FRONTMATTER"
      chapters: 
        - _chapters/FRONTMATTER/preface.qmd
        - _chapters/FRONTMATTER/foreword.qmd
    
    - part: "THE FIRST PART"
      chapters: 
        - _chapters/THE_FIRST_PART/introduction.qmd
      
    - part: "THE SECOND PART"
      chapters: 
        - _chapters/THE_SECOND_PART/chapter_3.qmd
        - _chapters/THE_SECOND_PART/chapter_4.qmd
              
    - part: "THE THIRD PART"
      chapters:
        - _chapters/THE_THIRD_PART/afterword.qmd
        - _chapters/THE_THIRD_PART/references.qmd
        
      
  appendices:
    - _chapters/APPENDICES/appendix_a.qmd
    - _chapters/APPENDICES/glossary.qmd

# and back to specifying the other options contained in the _quarto.yml document in Scrivener's Binder...

So, with that longwinded introduction over, my question for you is: What is the approach that you used in Marked to preview the nested hierarchical Binder structure to show all the documents? And what advice would you give for an approach to creating the nested hierarchy of folders/files as well as dynamically entering the path list into the _quarto.yml file?

(Respectfully, I am not asking you to solve the problem for me, nor do I expect you to necessarily figure out the Scrivener “compile” side of things. I am simply looking for advice re: how to exploit the power of the Binder to create the starting point for a Quarto Book project, which is essentially nested folders and files.)

I appreciate your time and advice!!
–Alexander.

PS> For what it’s worth, Python would likely be my tool of choice.

ttscoff · September 13, 2024, 2:26pm

In Marked I parse the XML file inside the scrivx document (which is a folder, if you haven’t already checked that out), recursively rendering the linked RTF file to Markdown and compiling it in order into a single Markdown document. The method determines if the current node is a leaf and renders it, or a parent and then runs the method on the children of the parent. The same method could be applied by just creating a folder if it were a parent, and then parsing the children, continuing if nested. A YAML outline could be created as it parses by storing the document outline as a dictionary and then outputting the formatted result at the end of processing. To fully automate this you’ll need a method of converting RTF to Markdown, I can’t recall if Scriveners Markdown export creates nested trees like you need or not.

Does that help? Feel free to ask specific questions as I might have lost the thread in this long post

Cavalierex · September 18, 2024, 12:30am

Thanks, Brett. Your tip helped get me started. I didn’t know the .scriv package was a simple folder – I thought it was compressed somehow and would be harder to access and explore. Once inside the package, it was easy to find the .scrivx XML file as well as the /Files/Data/ directory.

Okay, confession time: I think I spent about 30 hours getting this thing to work! but it actually works great right now.

Basically, I am able to use the binder to organize my project, and I write in Markdown in Scrivener’s editor. I can use all of Scrivener’s handy tools while I am brainstorming, researching, and writing.
When I want to preview or render the document in Quarto, I execute my Python program (which I call “Squarto”). Squarto parses the XML file to understand the binder structure, creates parallel folder/file structures, moves the content files into the proper location (converting them to plain .qmd Markdown), creates the YAML content describing the structure of the book and injects that into the _quarto.yml file, and finally runs quarto render --to=pdf etc. From that point on, Quarto takes care of the rest (code execution, citation/bibiliography creation, cross-refs, table of contents, indexes, etc.).

The system that I created does not rely on Scrivener’s built-in Compile process (as the “Squarto” program is essentially an alternate export/compile/postprocess pipeline). But it does respect some Compile settings, for example excluding some documents from being exported to Quarto.

There are several features that I’d like to add to “Squarto”, and I’ll likely iterate on it over the next year to get it to “production stage.” I plan to open-source it at that stage.

Thanks for your help!

ttscoff · September 18, 2024, 5:53pm

Holy cow, that’s so awesome. Nice work! I wonder if it’s generalized enough for anyone else to make use of? If so, and you want to publish it, I’d be happy to mention it in a Web Excursions post.

Cavalierex · September 20, 2024, 9:03pm

Presently, Squarto is a nicely structured program with separation of concerns. It runs as a script with fixed parameters, but I will eventually make this a library (to import into other projects) with a CLI for local execution with parameters/options passed at the command line.

Since this is the General Nerdery channel, in case the workflow interests you or other readers, here is a high-level view…^[1]

def main():

    SCRIV_FILE = 'scrivener_file.scriv'  # Eventually pass via CLI

    if is_valid_scriv_package(SCRIV_FILE):

        # Identify components from the .scriv package and .scrivx file
        scrivx_file = get_scrivx_file_from_scriv_file(SCRIV_FILE)
        data_dir = get_data_dir_from_scriv(SCRIV_FILE)
        binder = extract_binder_from_scrivx(scrivx_file, folders_to_exclude={'Planning'})

        # Identify file paths to the content.rtf for each element in the binder
        binder_paths = extract_binder_paths_as_dict(binder)

        # Create Quarto directory structure and copy files over
        base_dir = '/Users/me/my_path/to/quarto_output'  # Eventually, pass via CLI
        create_directories_and_files_from_binder_paths(binder_paths=binder_paths, base_dir=base_dir)
        move_content_files(binder_paths=binder_paths, data_dir=data_dir, base_dir=base_dir)   # Includes calls to `convert_rtf_to_plaintext()` and `process_plaintext()`

        # Make sure _quarto.yml gets updated
        update_quarto_yaml_with_chapters_parts_appendices(binder, base_dir)

The move_content_files() function will detect if the file is in .rtf format, in which case it calls convert_rtf_to_plaintext(). This function calls Pandoc to handle the conversion :

output = pypandoc.convert_file(rtf_path, format='rtf', to='plain')

Scrivener has its own codes in the .rtf, so all of these need to be stripped out, and I handle this and some other formatting issues (for whitespace, etc.) in a function called process_plaintext().

The end result of all this is that I have a fully valid Quarto Book project (folders and files) saved to disk. This can be previewed or rendered via Quarto. It can be submitted to version control. It can be zipped and shared with others who don’t have Scrivener but do use Quarto.

As you can see, I tend to use verbose but descriptive function names in Python. This helps me by being self-documenting so I don’t need to add (unnecessary) comments all over the place – But in the example here, I added some extra comments to help the reader. ↩︎

Cavalierex · September 20, 2024, 9:10pm

The structure of the Binder (contained in the .scrivx XML) and even the other files in the .scriv file are very tidy and interesting, and it is evidence of the forward-thinking ingenuity of Keith Blount, principal developer of Scrivener. But I can leave explanation of the Binder and these other files for another time if you (or another reader) is interested. It ends up being very easy to work with.^[1]

Well, easy to work with after converting from XML to the much friendlier JSON/dict. ↩︎