fbpx
Wikipedia

Soft hyphen

In computing and typesetting, a soft hyphen (ISO 8859: 0xAD, UnicodeU+00ADSOFT HYPHEN, HTML: ­ or ­ or ­) or syllable hyphen (EBCDIC: 0xCA), abbreviated SHY, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens. Two alternative ways of using the soft hyphen character for this purpose have emerged, depending on whether the encoded text will be broken into lines by its recipient, or has already been preformatted by its originator.

This section needs additional citations for verification. Please help improve this article by . Unsourced material may be challenged and removed.(February 2020) ()

The use of SHY characters in text that will be broken into lines by the recipient is the application context considered by the post-1999 HTML and Unicode specifications, as well as some word-processing file formats. In this context, the soft hyphen may also be called a discretionary hyphen or optional hyphen. It serves as an invisible marker used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed. It becomes visible only after word wrapping at the end of a line. The soft hyphen's Unicode semantics and HTML implementation are in many ways similar to Unicode's zero-width space, with the exception that the soft hyphen will preserve the kerning of the characters on either side when not visible. The zero-width space, on the other hand, will not, as it is considered a visible character even if not rendered, thus having its own kerning metrics.

To show the effect of a soft hyphen in HTML, the words of the following text have been separated with soft hyphens:

Margaret­Are­You­Grieving­Over­Goldengrove­Unleaving­Leaves­Like­The­Things­Of­Man­You­With­Your­Fresh­Thoughts­Care­For­Can­You­Ah­As­The­Heart­Grows­Older­It­Will­Come­To­Such­Sights­Colder­By­And­By­Nor­Spare­A­Sigh­Though­Worlds­Of­Wanwood­Leafmeal­Lie­And­Yet­You­Will­Weep­And­Know­Why­Now­No­Matter­Child­The­Name­Sorrows­Springs­Are­The­Same­Nor­Mouth­Had­No­Nor­Mind­Expressed­What­Heart­Heard­Of­Ghost­Guessed­It­Is­The­Blight­Man­Was­Born­For­It­Is­Margaret­You­Mourn­For

On HTML browsers supporting soft hyphens, resizing the window will re-break the above text only at word boundaries, and insert a hyphen at the end of each line.

The SHY character is also used in text where paragraphs have already been broken into lines, such as certain plain text files, text sent to VT100-style terminal emulators or printers, or pages represented in page description languages. This is the application context originally considered by the EBCDIC and ISO 8859-1 standards and implemented in many VT100 terminal emulators.

Here, SHY is a visible hyphen that is usually visually indistinguishable from a regular hyphen, but has been inserted solely for the purpose of line breaking. The purpose of the soft hyphen here is to distinguish it from any regular hyphen that might have been part of the original spelling of the word. This distinction helps re-use of already formatted text, when line breaks and soft hyphens inserted during word wrapping have to be removed to convert the text back into its unformatted form. For example, the copy or paste function of a terminal emulator can offer to replace line breaks with a space character, and remove any soft hyphens including any immediately following whitespace characters.

An example application that outputs soft hyphens for this reason is the groff text formatter as used on many Unix/Linux systems to display man pages.

SHY characters in coded characters sets, roughly in chronological order:

  • EBCDIC placed a SHY character (known there as a "syllable hyphen") at position 202 (0xCA hexadecimal). IBM defined its purpose as a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."
  • German standard DIN 31626 defined a C1 control code set defining 0x8D as an "Optional Syllabification Control (OSC)", a "print control character" for use marking syllable boundaries in long words. This C1 control set was registered in 1979. (Note: this is not the same as the ISO/IEC 6429 C1 control code Operating System Command (OSC).)
  • ISO 8859-1:1986 (Latin 1) inherited SHY from EBCDIC, but called it "soft hyphen", placed it at position 0xAD (hexadecimal), and stated its purpose as "for use when a line break has been established within a word". Other ISO 8859 parts placed it at the same position, with the exception of ISO 8859-11 (Latin/Thai), which lacks it.
  • IBM code page 850 (an MS-DOS character set covering all ISO 8859-1 characters) placed it at position 240 = 0xF0.
  • SGML's "Numeric and Special Graphic" (isonum) character entity set (ISO 8879:1986) includes "­" for the ISO 8859-1 soft hyphen.
  • Unicode 1.0 (1991) and ISO 10646 (1993) took the first 256 code positions from ISO 8859-1, resulting in SHY at Unicode code point of U+00AD.
  • HTML 2 (1995) incorporated the "­" character entity from SGML, but explicitly discouraged its use.
  • HTML 4 (1999) redefined the purpose of the character as marking a hyphenation opportunity, which only becomes visible as a hyphen at the end of a line after formatting.
  • Unicode 4.0 (2002) changed the category of its SHY character from previously "Pd" (punctuation, dash) to "Cf" (other, format), thereby aligning its interpretation of the character with that of HTML 4.

Other commands for marking hyphenation opportunities in text formatting languages (similar to the HTML 4 and Unicode 4.0 interpretation of SHY):

Soft hyphens have been used to obscure malicious domains or URLs in e-mail spam.

  1. Jukka Korpela (January 2011). "Soft hyphen (SHY) – a hard problem?". Tampere University of Technology. Retrieved8 April 2011.
  2. Markus G. Kuhn (4 June 2003). "Unicode interpretation of SOFT HYPHEN breaks ISO 8859-1 compatibility"(PDF). Unicode Technical Committee. L2/03-155R.
  3. Eric Muller (14 August 2002). "Yes, SOFT HYPHEN is a hard problem". Unicode Technical Committee. L2/02-279.
  4. Demonstration text is from the poem Hopkins, Gerard Manley, Spring and Fall: to a young child
  5. "Extended Binary-Coded Decimal Interchange Code - S/390". comsci.us. Retrieved8 April 2011.
  6. "Glossary". IBM. Retrieved8 April 2011.
  7. DIN (15 July 1979). Additional Control Functions for Bibliographic Use according to German Standard DIN 31626(PDF). ITSCJ/IPSJ. ISO-IR-040.
  8. "Commonly Confused Characters". Greg Baker, Simon Fraser University. Retrieved12 July 2011.
  9. "Spammers Using Soft Hyphen To Hide Malicious URLs". Slashdot. 7 October 2010. Retrieved8 April 2011.
  10. "Soft Hyphen – A New URL Obfuscation Technique". Symantec. Retrieved8 April 2011.

Soft hyphen
Soft hyphen Language Watch Edit In computing and typesetting a soft hyphen ISO 8859 0xAD Unicode U 00AD SOFT HYPHEN HTML amp xAD or amp 173 or amp shy or syllable hyphen EBCDIC 0xCA abbreviated SHY is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens Two alternative ways of using the soft hyphen character for this purpose have emerged depending on whether the encoded text will be broken into lines by its recipient or has already been preformatted by its originator 1 2 3 ISO symbol for soft hyphen Contents 1 Text to be formatted by the recipient 2 Text preformatted by the originator 3 Encodings and definitions 4 Security issues 5 See also 6 ReferencesText to be formatted by the recipient EditThis section needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed February 2020 Learn how and when to remove this template message The use of SHY characters in text that will be broken into lines by the recipient is the application context considered by the post 1999 HTML and Unicode specifications as well as some word processing file formats In this context the soft hyphen may also be called a discretionary hyphen or optional hyphen It serves as an invisible marker used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re flowed It becomes visible only after word wrapping at the end of a line The soft hyphen s Unicode semantics and HTML implementation are in many ways similar to Unicode s zero width space with the exception that the soft hyphen will preserve the kerning of the characters on either side when not visible The zero width space on the other hand will not as it is considered a visible character even if not rendered thus having its own kerning metrics To show the effect of a soft hyphen in HTML the words of the following text 4 have been separated with soft hyphens Margaret Are You Grieving Over Goldengrove Unleaving Leaves Like The Things Of Man You With Your Fresh Thoughts Care For Can You Ah As The Heart Grows Older It Will Come To Such Sights Colder By And By Nor Spare A Sigh Though Worlds Of Wanwood Leafmeal Lie And Yet You Will Weep And Know Why Now No Matter Child The Name Sorrows Springs Are The Same Nor Mouth Had No Nor Mind Expressed What Heart Heard Of Ghost Guessed It Is The Blight Man Was Born For It Is Margaret You Mourn For On HTML browsers supporting soft hyphens resizing the window will re break the above text only at word boundaries and insert a hyphen at the end of each line Text preformatted by the originator EditThe SHY character is also used in text where paragraphs have already been broken into lines such as certain plain text files text sent to VT100 style terminal emulators or printers or pages represented in page description languages This is the application context originally considered by the EBCDIC and ISO 8859 1 standards and implemented in many VT100 terminal emulators 1 2 Here SHY is a visible hyphen that is usually visually indistinguishable from a regular hyphen but has been inserted solely for the purpose of line breaking The purpose of the soft hyphen here is to distinguish it from any regular hyphen that might have been part of the original spelling of the word This distinction helps re use of already formatted text when line breaks and soft hyphens inserted during word wrapping have to be removed to convert the text back into its unformatted form For example the copy or paste function of a terminal emulator can offer to replace line breaks with a space character and remove any soft hyphens including any immediately following whitespace characters An example application that outputs soft hyphens for this reason is the groff text formatter as used on many Unix Linux systems to display man pages Encodings and definitions EditSHY characters in coded characters sets roughly in chronological order EBCDIC placed a SHY character known there as a syllable hyphen at position 202 0xCA hexadecimal 1 5 IBM defined its purpose as a hyphen used to divide a word at the end of a line that may be removed when a program adjusts lines 6 German standard DIN 31626 defined a C1 control code set defining 0x8D as an Optional Syllabification Control OSC a print control character for use marking syllable boundaries in long words This C1 control set was registered in 1979 7 Note this is not the same as the ISO IEC 6429 C1 control code Operating System Command OSC ISO 8859 1 1986 Latin 1 inherited SHY from EBCDIC but called it soft hyphen placed it at position 0xAD hexadecimal and stated its purpose as for use when a line break has been established within a word Other ISO 8859 parts placed it at the same position with the exception of ISO 8859 11 Latin Thai which lacks it IBM code page 850 an MS DOS character set covering all ISO 8859 1 characters placed it at position 240 0xF0 SGML s Numeric and Special Graphic isonum character entity set ISO 8879 1986 includes amp shy for the ISO 8859 1 soft hyphen Unicode 1 0 1991 and ISO 10646 1993 took the first 256 code positions from ISO 8859 1 resulting in SHY at Unicode code point of U 00AD HTML 2 1995 incorporated the amp shy character entity from SGML but explicitly discouraged its use HTML 4 1999 redefined the purpose of the character as marking a hyphenation opportunity which only becomes visible as a hyphen at the end of a line after formatting Unicode 4 0 2002 changed the category of its SHY character from previously Pd punctuation dash to Cf other format thereby aligning its interpretation of the character with that of HTML 4 Other commands for marking hyphenation opportunities in text formatting languages similar to the HTML 4 and Unicode 4 0 interpretation of SHY troff and groff TeX and LaTeX 8 Security issues EditSoft hyphens have been used to obscure malicious domains or URLs in e mail spam 9 10 See also EditHard hyphen Non breaking space Word divider Word joiner Zero width space Word wrapReferences Edit a b c Jukka Korpela January 2011 Soft hyphen SHY a hard problem Tampere University of Technology Retrieved 8 April 2011 a b Markus G Kuhn 4 June 2003 Unicode interpretation of SOFT HYPHEN breaks ISO 8859 1 compatibility PDF Unicode Technical Committee L2 03 155R Eric Muller 14 August 2002 Yes SOFT HYPHEN is a hard problem Unicode Technical Committee L2 02 279 Demonstration text is from the poem Hopkins Gerard Manley Spring and Fall to a young child Extended Binary Coded Decimal Interchange Code S 390 comsci us Retrieved 8 April 2011 Glossary IBM Retrieved 8 April 2011 DIN 15 July 1979 Additional Control Functions for Bibliographic Use according to German Standard DIN 31626 PDF ITSCJ IPSJ ISO IR 040 Commonly Confused Characters Greg Baker Simon Fraser University Retrieved 12 July 2011 Spammers Using Soft Hyphen To Hide Malicious URLs Slashdot 7 October 2010 Retrieved 8 April 2011 Soft Hyphen A New URL Obfuscation Technique Symantec Retrieved 8 April 2011 Retrieved from https en wikipedia org w index php title Soft hyphen amp oldid 1041124638, wikipedia, wiki, book,

books

, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.