Intuitive Way to Add Custom Segmentation for Text-Based Formats

7 mins read

Custom Segmentation for Text-Based Formats

For a non-tech-savvy person, creating an SRX file with the segmentation rules might become a time-consuming task. You need to go through the list of specifications and compose the rules using the XML vocabulary. On the other hand, merging and splitting the strings manually might take an even longer time. Luckily, there’s now an intuitive way to add custom segmentation rules, test, and apply them to similar strings automatically.

In this post, we’ll take a closer look at content segmentation, discuss why it is important in localization, and give you step-by-step instructions on how you can quickly add custom segmentation with the Segmentation Rules Generator app.

How and Why Crowdin Divides Your Content Into Segments

When you upload a file to Crowdin in non-key-value formats like DOCX, HTML, XML, MD, and similar, the system divides this content into strings (segments) based on the SRX 2.0 standard. SRX stands for Segmentation Rules Exchange and is an established XML-based standard that describes how translation and other language-processing tools should divide texts into fragments.

Text segmentation makes the Translation Memory more usable. With longer text pieces divided into smaller ones, you will be able to use TM suggestions with different similarity matches, which would be much harder to do for the longer copy.

Two Approaches to Custom Segmentation Combined Together

Now imagine, you localize an app Bingo! in Crowdin. It’s a great app that helps users arrange ideas during brainstorming sessions. And yes, its name has an exclamation mark at the end. This means, based on the SRX 2.0 standard, every string where this name appears is split into two.

Previously, to fix this, you had to either manually go through the content and merge the strings or create an SRX file with the segmentation rules as in this sample. The first approach is applicable if there are few strings to correct, the second if you’re perfectly aware of the SRX specifications and XML formatting.

To simplify creating custom segmentation rules, we launched the Segmentation Rules Generator app that combines the two approaches. In the app visual interface, you can easily create the SRX for a specific file, add rules, and go through the strings, merge, and split them where necessary. As you do, the SRX file with rules is updated automatically so that you can apply similar rules to multiple strings.

With the app, you can also upload the SRX files you may have from previous projects or tools, and preview how the content is segmented based on the rules you’ve added.

Add Custom Segmentation to Your Localization Files With Ease

To try out the new approach and change segmentation within a specific file, follow these steps:

  1. Install the Segmentation Rules Generator app on Crowdin or Crowdin Enterprise.

In Crowdin, go to Resources in the menu bar and select Marketplace in the drop-down menu.
If you use Crowdin Enterprise, use the left-side menu of your workspace to open the Marketplace.
Learn more about Crowdin Store.

  1. Open the app.

In Crowdin, go to Project Settings > Integrations and scroll down to Applications and select the app.
In Crowdin Enterprise, go to Project Home > Applications > Custom > Segmentation Rules Generator.

  1. Within the app, you will see two tabs:
  • SRX Editor - where you can upload or create SRX.
  • Apply SRX to files - where you can configure to which files uploaded or created, SRX will be applied. Select file for custom segmentation
  1. Choose Create SRX > Rules > Add rule.

Note: Currently, you can create segmentation rules for the following file format types: DOCX, MD, HTML, DITA, IDML, TXT, and XML. Add custom segmentation rules

  1. Then, select the necessary project file. Once you select the file, you’ll see the existing rules for segmentation and all the strings you can split or join.

  2. To split strings, put the cursor in the necessary place and click the scissors icon. Clicking on the two arrow icon next to the string will merge it to the previous one. Split strings

  3. After you make any changes to the segmentation via UI, the necessary rule will be added to the file and will be automatically applied to similar strings. You can also edit the existing rules and preview the newly arranged segmentation directly in the app. Segmentation rules

  4. Save the SRX file and use it to change the segmentation in specific files. For this, go to Apply SRX to the files tab and configure to which project files or folders SRX will be applied.

If you select the folder and click the icon next to the folder name, the app will create a webhook. It will ensure that the SRX rules you chose will be applied to every newly added file in this folder. Apply rules to folders

Translators Will Still See the Whole Text for Reference

Even though segmentation is crucial for creating Translation Memories, translators often need to see the whole text to get the main idea and necessary context. Translation per string might influence the translation quality, and that’s where Crowdin WYSIWYG becomes of use.

In the Crowdin Editor, translators can switch between views and, if necessary, preview the whole document with the pictures, tables, columns, lists it might contain.

Learn more about the context for translators.


Explore More Crowdin Apps

Custom segmentation is usually a set-and-forget-it type of configuration. So once you’re done with it, discover more useful apps on Crowdin Store. They will help you customize your company’s localization experience and get more of Crowdin and Crowdin Enterprise.

Go to Crowdin Store.

Localize your product with Crowdin

Automate content updates and reach new markets faster.
Iryna Namaka

Previous Post
How to Prepare Content for Localization: 6 Tips
Next Post
5 Best Practices for UI Localization