首页 \ 问答 \ python删除中文unicode字符串之间的空格,但不删除英文单词之间的空格(python to remove space between Chinese unicode strings but not between English words)

python删除中文unicode字符串之间的空格,但不删除英文单词之间的空格(python to remove space between Chinese unicode strings but not between English words)

需要python正则表达式的帮助,我有一个包含中文和英文的字符串,我想删除汉字之间的空格而不是英文单词之间。

来自 - u'\ u5c0f \ u5973 \ u4eca \ u5e74 \ u4fc2 dse \ u8003 \ u751f \ u579 \ u559c \ u6b61 filmtv \ u524d \ u5e7e \ u65e5 in \ u5de6 buasso-filmtv and digital media studies \ u5df2 \ u7d93条件优惠\ u4f46 \ u60f3 \ u554f \ u5982 \ u679c通过jupas openu \ u6536 \ u5979 \ u8b80荣誉艺术学士在创意写作和电子表格中“

来自ubade-filmtv和数字媒体研究中的“u'\ u5c0f \ u5973 \ u4eca \ u5e74 \ u4fc2 dse \ u8003 \ u751f \ u579 \ u559c \ u6b61 filmtv \ u524d \ u59c \ u6c5 \ u5df2 \ u7d93条件优惠\ u4f46 \ u60f3 \ u554f \ u5982 \ u679c通过jupas openu \ u6536 \ u5979 \ u8b80荣誉艺术学士在创意写作和电子表格中“

只在两个unicode字符之间删除空格


need help on python regex, I have a string contains Chinese and English, I would like to remove white space between Chinese characters but not between English words.

from -- "u'\u5c0f \u5973 \u4eca \u5e74 \u4fc2 dse \u8003 \u751f \u5979 \u559c \u6b61 filmtv \u524d \u5e7e \u65e5 in \u5de6 buasso-filmtv and digital media studies \u5df2 \u7d93 condition offer \u4f46 \u60f3 \u554f \u5982 \u679c through jupas openu \u6536 \u5979 \u8b80 bachelor of arts with honours In creative writing and filmarts"

to -- "u'\u5c0f\u5973\u4eca\u5e74\u4fc2 dse \u8003\u751f\u5979\u559c\u6b61 filmtv \u524d\u5e7e\u65e5 in \u5de6 buasso-filmtv and digital media studies \u5df2\u7d93 condition offer \u4f46\u60f3\u554f\u5982\u679c through jupas openu \u6536\u5979\u8b80 bachelor of arts with honours In creative writing and filmarts"

only remove white space when it's between two unicode characters

更新时间:2023-02-04 10:02

最满意答案

如果你可以将“unicode characters”定义为“非ASCII”字符,那么你可以使用负向前瞻/后观

re.sub("(?<![ -~]) (?![ -~])", "", text)

如果您不喜欢使用的范围([ - 〜]),那么这个问题有一些替代方案 。 此外,有各种各样的unicode类别可以更好地满足您的目的,但据我所知,您仍然需要手动定义字符范围,因为它们在re模块中不受支持。


If you're fine with defining "unicode characters" as "non-ASCII" characters then you can do this with negative lookahead/lookbehind:

re.sub("(?<![ -~]) (?![ -~])", "", text)

If you don't like the ranges used ([ -~]) then this question has some alternatives. Additionally there are a variety of unicode categories that might serve your purpose better, but as far as I can tell you'll still have to manually define the character range as they're unsupported in the re module.

相关问答

更多

相关文章

更多

最新问答

更多
  • 在csproj中使用appdata环境变量(Use appdata environment variable in csproj)
  • 从背景返回后,Skobbler Map崩溃(Skobbler Map crashes after returning from background)
  • 如何保持对绑定服务的轮询?(How to keep polling a bound service?)
  • ASP.NET单选按钮jQuery处理(ASP.NET radio button jQuery handling)
  • Linux上的FORTRAN图形库(FORTRAN graphic library on Linux)
  • 我们如何根据索引更新dynamodb表(不基于primary has和range key)(how can we update dynamodb table based on index(not based on primary has and range key))
  • 功能包装避免重复(wrap of functions avoid duplicating)
  • Android BroadcastReceiver和Activity.onPause()(Android BroadcastReceiver and Activity.onPause())
  • 无法使用phonegap 2.4在Android上播放录音(unable to play audio recordings on android using phonegap 2.4)
  • VS2015 + Resharper:不要使用C#6(VS2015 + Resharper: Don't use C#6)
  • 大学电脑四级对初学者来说要多久能过
  • 特殊字符删除?(Special characters remove?)
  • Android视频教程现在网上的都比较零散呢?有些太坑爹了,感觉老师就是在想当然的讲
  • 计算同一个表中不同行之间的差异[重复](Calculate delta's between different rows in same table [duplicate])
  • Javaweb开发,技术路线是什么?该怎么写?
  • JavaScript只在php代码中执行一次(JavaScript only executes once inside php code)
  • 不兼容的字符编码:ASCII-8BIT和UTF-8(incompatible character encodings: ASCII-8BIT and UTF-8)
  • Clojure(加载文件)给出错误(Clojure (load-file) gives an error)
  • 为具有瞬态scala依赖性的spring-xd项目优化gradle(Optimize gradle for spring-xd project with transient scala dependency)
  • 如何才能在Alpha测试模式下发布我的应用程序?(How can I publish my app in Alpha test mode only?)
  • “没有为此目标安装系统映像”Xamarin AVD Manager(“No system images installed for this target” Xamarin AVD Manager)
  • maven中的Scalatest:JUnit结果(Scalatest in maven: JUnit results)
  • 使用android SDK将文件直接上传到存储桶中的文件夹(Upload a file directly to a folder in bucket using android SDK)
  • 是否应将plists导入CoreData?(Should plists be imported to CoreData?)
  • java.lang.reflect.InvocationTargetException JavaFX TableView(java.lang.reflect.InvocationTargetException JavaFX TableView)
  • 根据唯一列值动态创建多个子集(Dynamically create multiple subsets based on unique column values)
  • 使用CSS可以使HTML锚标签不可点击/可链接吗?(Is it possible to make an HTML anchor tag not clickable/linkable using CSS?)
  • 嵌套的模板可能性(Nested template possibilities)
  • 任何方式在iOS7 +上以编程方式打开蓝牙(Any way to turn on bluetooth programmatically on iOS7+)
  • 如何为给定的SQL查询编写JPA查询(How I can write JPA query for given SQL query)